{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T03:44:12Z","timestamp":1771904652804,"version":"3.50.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2025,1,14]],"date-time":"2025-01-14T00:00:00Z","timestamp":1736812800000},"content-version":"vor","delay-in-days":53,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,11,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>This study addresses the challenging task of identifying viruses within metagenomic data, which encompasses a broad array of biological samples, including animal reservoirs, environmental sources, and the human body. Traditional methods for virus identification often face limitations due to the diversity and rapid evolution of viral genomes. In response, recent efforts have focused on leveraging artificial intelligence (AI) techniques to enhance accuracy and efficiency in virus detection. However, existing AI-based approaches are primarily binary classifiers, lacking specificity in identifying viral types and reliant on nucleotide sequences. To address these limitations, VirDetect-AI, a novel tool specifically designed for the identification of eukaryotic viruses within metagenomic datasets, is introduced. The VirDetect-AI model employs a combination of convolutional neural networks and residual neural networks to effectively extract hierarchical features and detailed patterns from complex amino acid genomic data. The results demonstrated that the model has outstanding results in all metrics, with a sensitivity of 0.97, a precision of 0.98, and an F1-score of 0.98. VirDetect-AI improves our comprehension of viral ecology and can accurately classify metagenomic sequences into 980 viral protein classes, hence enabling the identification of new viruses. These classes encompass an extensive array of viral genera and families, as well as protein functions and hosts.<\/jats:p>","DOI":"10.1093\/bib\/bbaf001","type":"journal-article","created":{"date-parts":[[2025,1,14]],"date-time":"2025-01-14T13:48:21Z","timestamp":1736862501000},"source":"Crossref","is-referenced-by-count":8,"title":["VirDetect-AI: a residual and convolutional neural network\u2013based metagenomic tool for eukaryotic viral protein identification"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-5407-6598","authenticated-orcid":false,"given":"Alida","family":"Z\u00e1rate","sequence":"first","affiliation":[{"name":"Doctorado en Ciencias, Instituto de Investigaci\u00f3n en Ciencias B\u00e1sicas Aplicadas (IICBA), Universidad Aut\u00f3noma del Estado de Morelos , Cuernavaca, Morelos 62210 ,","place":["M\u00e9xico"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1577-5629","authenticated-orcid":false,"given":"Lorena","family":"D\u00edaz-Gonz\u00e1lez","sequence":"additional","affiliation":[{"name":"Centro de Investigaci\u00f3n en Ciencias, Universidad Aut\u00f3noma del Estado de Morelos , Cuernavaca, Morelos 62210 ,","place":["M\u00e9xico"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1896-5962","authenticated-orcid":false,"given":"Blanca","family":"Taboada","sequence":"additional","affiliation":[{"name":"Departamento de Gen\u00e9tica del Desarrollo y Fisiolog\u00eda Molecular, Instituto de Biotecnolog\u00eda, Universidad Nacional Aut\u00f3noma de M\u00e9xico , Cuernavaca, Morelos 62210 ,","place":["M\u00e9xico"]}]}],"member":"286","published-online":{"date-parts":[[2025,1,14]]},"reference":[{"key":"2025011413480573100_ref1","doi-asserted-by":"publisher","first-page":"2140","DOI":"10.3390\/foods12112140","article-title":"Metagenomics: an effective approach for exploring microbial diversity and functions","volume":"12","author":"Nam","year":"2023","journal-title":"Foods"},{"key":"2025011413480573100_ref2","doi-asserted-by":"publisher","DOI":"10.1186\/s40168-017-0283-5","article-title":"VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data","volume":"5","author":"Ren","year":"2017","journal-title":"Microbiome"},{"key":"2025011413480573100_ref3","doi-asserted-by":"publisher","first-page":"336","DOI":"10.1186\/s12859-018-2340-x","article-title":"Machine learning for detection of viral sequences in human metagenomic datasets","volume":"19","author":"Bzhalava","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2025011413480573100_ref4","doi-asserted-by":"publisher","first-page":"119641","DOI":"10.1016\/j.eswa.2023.119641","article-title":"Viral genome prediction from raw human DNA sequence samples by combining natural language processing and machine learning techniques","volume":"218","author":"Alshayeji","year":"2023","journal-title":"Expert Syst Appl"},{"key":"2025011413480573100_ref5","doi-asserted-by":"publisher","DOI":"10.1186\/s40168-020-00990-y","article-title":"VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses","volume":"9","author":"Guo","year":"2021","journal-title":"Microbiome"},{"key":"2025011413480573100_ref6","doi-asserted-by":"publisher","first-page":"e0222271","DOI":"10.1371\/journal.pone.0222271","article-title":"ViraMiner: deep learning on raw DNA sequences for identifying viral genomes in human samples","volume":"14","author":"Tampuu","year":"2019","journal-title":"PLoS One"},{"key":"2025011413480573100_ref7","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1007\/s40484-019-0187-4","article-title":"Identifying viruses from metagenomic data using deep learning","volume":"8","author":"Ren","year":"2020","journal-title":"Quant Biol"},{"key":"2025011413480573100_ref8","doi-asserted-by":"crossref","first-page":"3002","DOI":"10.1007\/s10489-021-02572-3","article-title":"Explainable deep neural networks for novel viral genome prediction","volume":"52","author":"Dasari","year":"2021","journal-title":"Appl Intell"},{"key":"2025011413480573100_ref9","volume-title":"Proceedings 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","author":"Zhang","year":"2022"},{"key":"2025011413480573100_ref10","doi-asserted-by":"publisher","first-page":"1216","DOI":"10.1093\/bioinformatics\/btab845","article-title":"Virtifier: a deep learning-based identifier for viral sequences from metagenomes","volume":"38","author":"Miao","year":"2021","journal-title":"Bioinformatics"},{"key":"2025011413480573100_ref11","doi-asserted-by":"publisher","first-page":"763","DOI":"10.1109\/TCBB.2022.3161135","article-title":"Virsearcher: identifying bacteriophages from metagenomes by combining convolutional neural network and gene information","volume":"20","author":"Liu","year":"2023","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2025011413480573100_ref12","doi-asserted-by":"publisher","first-page":"D20","DOI":"10.1093\/nar\/gkab1112","article-title":"Database resources of the national center for biotechnology information","volume":"50","author":"Sayers","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025011413480573100_ref13","doi-asserted-by":"publisher","first-page":"e1007314","DOI":"10.1371\/journal.ppat.1007314","article-title":"A planarian nidovirus expands the limits of RNA genome size","volume":"14","author":"Saberi","year":"2018","journal-title":"PLoS Pathog"},{"key":"2025011413480573100_ref14","doi-asserted-by":"publisher","first-page":"D708","DOI":"10.1093\/nar\/gkx932","article-title":"Virus taxonomy: the database of the international committee on taxonomy of viruses (ICTV)","volume":"46","author":"Lefkowitz","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2025011413480573100_ref15","doi-asserted-by":"publisher","first-page":"D576","DOI":"10.1093\/nar\/gkq901","article-title":"ViralZone: a knowledge resource to understand virus diversity","volume":"39","author":"Hulo","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2025011413480573100_ref16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3131611","article-title":"Comparative analysis of sequence clustering methods for deduplication of biological databases","volume":"9","author":"Chen","year":"2017","journal-title":"J Data Inf Qual"},{"key":"2025011413480573100_ref17","doi-asserted-by":"publisher","first-page":"282","DOI":"10.1093\/bioinformatics\/17.3.282","article-title":"Clustering of highly homologous sequences to reduce the size of large protein databases","volume":"17","author":"Li","year":"2001","journal-title":"Bioinformatics"},{"key":"2025011413480573100_ref18","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-10-421","article-title":"BLAST+: architecture and applications","volume":"10","author":"Camacho","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2025011413480573100_ref19","volume-title":"Proceedings 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"He"},{"key":"2025011413480573100_ref20","volume-title":"Keras","author":"Chollet","year":"2015"},{"key":"2025011413480573100_ref21","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1605.08695","article-title":"TensorFlow: a system for large-scale machine learning","author":"Abadi","year":"2016"},{"key":"2025011413480573100_ref22","doi-asserted-by":"publisher","article-title":"Generalized cross entropy loss for training deep neural networks with noisy labels","author":"Zhang","DOI":"10.48550\/arXiv.1805.07836"},{"key":"2025011413480573100_ref23","article-title":"Adam: a method for stochastic optimization","author":"Kingma"},{"key":"2025011413480573100_ref24","article-title":"Experiment tracking with weights and biases.","volume-title":"Weights & Biases","author":"Biewald","year":"2020"},{"key":"2025011413480573100_ref25","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1007\/s40745-020-00253-5","article-title":"A comprehensive survey of loss functions in machine learning","volume":"9","author":"Wang","year":"2020","journal-title":"Ann Data Sci"},{"key":"2025011413480573100_ref26","doi-asserted-by":"publisher","first-page":"e0163962","DOI":"10.1371\/journal.pone.0163962","article-title":"SeqKit: a cross-platform and ultrafast toolkit for FASTA\/Q file manipulation","volume":"11","author":"Shen","year":"2016","journal-title":"PLoS One"},{"key":"2025011413480573100_ref27","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-11-119","article-title":"Prodigal: prokaryotic gene recognition and translation initiation site identification","volume":"11","author":"Hyatt","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2025011413480573100_ref28","doi-asserted-by":"publisher","first-page":"i884","DOI":"10.1093\/bioinformatics\/bty560","article-title":"fastp: an ultra-fast all-in-one FASTQ preprocessor","volume":"34","author":"Chen","year":"2018","journal-title":"Bioinformatics"},{"key":"2025011413480573100_ref29","doi-asserted-by":"publisher","first-page":"3150","DOI":"10.1093\/bioinformatics\/bts565","article-title":"CD-HIT: accelerated for clustering the next-generation sequencing data","volume":"28","author":"Fu","year":"2012","journal-title":"Bioinformatics"},{"key":"2025011413480573100_ref30","doi-asserted-by":"publisher","first-page":"R25","DOI":"10.1186\/gb-2009-10-3-r25","article-title":"Ultrafast and memory-efficient alignment of short DNA sequences to the human genome","volume":"10","author":"Langmead","year":"2009","journal-title":"Genome Biol"},{"key":"2025011413480573100_ref31","doi-asserted-by":"publisher","first-page":"792","DOI":"10.1186\/s12879-022-07783-8","article-title":"Metagenomic analysis reveals differences in the co-occurrence and abundance of viral species in SARS-CoV-2 patients with different severity of disease","volume":"22","author":"I\u0161a","year":"2022","journal-title":"BMC Infect Dis"},{"key":"2025011413480573100_ref32","doi-asserted-by":"publisher","first-page":"455","DOI":"10.1089\/cmb.2012.0021","article-title":"SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing","volume":"19","author":"Bankevich","year":"2012","journal-title":"J Comput Biol"},{"key":"2025011413480573100_ref33","doi-asserted-by":"publisher","first-page":"938","DOI":"10.1038\/s41598-022-26707-9","article-title":"The fecal and oropharyngeal eukaryotic viromes of healthy infants during the first year of life are personal","volume":"13","author":"Rivera-Guti\u00e9rrez","year":"2023","journal-title":"Sci Rep"},{"key":"2025011413480573100_ref34","doi-asserted-by":"publisher","first-page":"2091","DOI":"10.1007\/s00253-018-8795-x","article-title":"Impact of gut-associated bifidobacteria and their phages on health: two sides of the same coin?","volume":"102","author":"Mahony","year":"2018","journal-title":"Appl Microbiol Biotechnol"},{"key":"2025011413480573100_ref35","doi-asserted-by":"publisher","first-page":"454","DOI":"10.1099\/jgv.0.001409","article-title":"ICTV Virus Taxonomy Profile: Botourmiaviridae","volume":"101","author":"Ayll\u00f3n","year":"2020","journal-title":"J Gen Virol"},{"key":"2025011413480573100_ref36","doi-asserted-by":"publisher","first-page":"2525","DOI":"10.1099\/vir.0.013086-0","article-title":"Molecular characterization of the plant virus genus Ourmiavirus and evidence of inter-kingdom reassortment of viral genome segments as its possible route of origin","volume":"90","author":"Rastgou","year":"2009","journal-title":"J Gen Virol"},{"key":"2025011413480573100_ref37","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1159\/000499088","article-title":"Viral metagenomics of blood donors and blood-derived products using next-generation sequencing","volume":"46","author":"Waldvogel-Abramowski","year":"2019","journal-title":"Transfus Med Hemother"},{"key":"2025011413480573100_ref38","doi-asserted-by":"publisher","first-page":"D384","DOI":"10.1093\/nar\/gkac1096","article-title":"The conserved domain database in 2023","volume":"51","author":"Wang","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2025011413480573100_ref39","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btac845","article-title":"KMCP: accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping","volume":"39","author":"Shen","year":"2022","journal-title":"Bioinformatics"},{"key":"2025011413480573100_ref40","doi-asserted-by":"publisher","DOI":"10.3389\/fmicb.2020.01690","article-title":"Human endogenous retrovirus K (HML-2) in health and disease","volume":"11","author":"Xue","year":"2020","journal-title":"Front Microbiol"},{"key":"2025011413480573100_ref41","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1093\/molbev\/msr226","article-title":"Fossil Rhabdoviral sequences integrated into arthropod genomes: ontogeny, evolution, and potential functionality","volume":"29","author":"Fort","year":"2011","journal-title":"Mol Biol Evol"},{"key":"2025011413480573100_ref42","doi-asserted-by":"publisher","first-page":"e1001030","DOI":"10.1371\/journal.ppat.1001030","article-title":"Unexpected inheritance: multiple integrations of ancient Bornavirus and ebolavirus\/Marburgvirus sequences in vertebrate genomes","volume":"6","author":"Belyi","year":"2010","journal-title":"PLoS Pathog"},{"key":"2025011413480573100_ref43","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1016\/j.virol.2015.02.039","article-title":"Origins and evolution of viruses of eukaryotes: the ultimate modularity","volume":"479-480","author":"Koonin","year":"2015","journal-title":"Virology"},{"key":"2025011413480573100_ref44","doi-asserted-by":"publisher","first-page":"1731","DOI":"10.1080\/10643389.2023.2181620","article-title":"Recommendations for the use of metagenomics for routine monitoring of antibiotic resistance in wastewater and impacted aquatic environments","volume":"53","author":"Davis","year":"2023","journal-title":"Crit Rev Environ Sci Technol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/1\/bbaf001\/61435915\/bbaf001.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/1\/bbaf001\/61435915\/bbaf001.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,14]],"date-time":"2025-01-14T13:48:25Z","timestamp":1736862505000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf001\/7953916"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,22]]},"references-count":44,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,11,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf001","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,1]]},"published":{"date-parts":[[2024,11,22]]},"article-number":"bbaf001"}}