{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,20]],"date-time":"2026-01-20T14:39:59Z","timestamp":1768919999870,"version":"3.49.0"},"reference-count":45,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2022,6,4]],"date-time":"2022-06-04T00:00:00Z","timestamp":1654300800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Ministry of Science","award":["2017M3A9F3041232"],"award-info":[{"award-number":["2017M3A9F3041232"]}]},{"name":"Institute of Information and Communications Technology Planning & Evaluation"},{"DOI":"10.13039\/501100014188","name":"MSIT","doi-asserted-by":"publisher","award":["2020-0-01373"],"award-info":[{"award-number":["2020-0-01373"]}],"id":[{"id":"10.13039\/501100014188","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,7,18]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Viruses are ubiquitous in humans and various environments and continually mutate themselves. Identifying viruses in an environment without cultivation is challenging; however, promoting the screening of novel viruses and expanding the knowledge of viral space is essential. Homology-based methods that identify viruses using known viral genomes rely on sequence alignments, making it difficult to capture remote homologs of the known viruses. To accurately capture viral signals from metagenomic samples, models are needed to understand the patterns encoded in the viral genomes. In this study, we developed a hierarchical BERT model named ViBE to detect eukaryotic viruses from metagenome sequencing data and classify them at the order level. We pre-trained ViBE using read-like sequences generated from the virus reference genomes and derived three fine-tuned models that classify paired-end reads to orders for eukaryotic deoxyribonucleic acid viruses and eukaryotic ribonucleic acid viruses. ViBE achieved higher recall than state-of-the-art alignment-based methods while maintaining comparable precision. ViBE outperformed state-of-the-art alignment-free methods for all test cases. The performance of ViBE was also verified using real sequencing datasets, including the vaginal virome.<\/jats:p>","DOI":"10.1093\/bib\/bbac204","type":"journal-article","created":{"date-parts":[[2022,5,4]],"date-time":"2022-05-04T11:19:51Z","timestamp":1651663191000},"source":"Crossref","is-referenced-by-count":31,"title":["ViBE: a hierarchical BERT model to identify eukaryotic viruses using metagenome sequencing data"],"prefix":"10.1093","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7765-0827","authenticated-orcid":false,"given":"Ho-Jin","family":"Gwak","sequence":"first","affiliation":[{"name":"Department of Computer Science, Hanyang University , Seoul, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2724-9477","authenticated-orcid":false,"given":"Mina","family":"Rho","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Hanyang University , Seoul, Korea"},{"name":"Department of Biomedical Informatics, Hanyang University , Seoul, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,6,4]]},"reference":[{"key":"2022071906083821500_ref1","doi-asserted-by":"crossref","first-page":"1795","DOI":"10.1126\/science.1127404","article-title":"Metagenomic analysis of coastal RNA virus communities","volume":"312","author":"Culley","year":"2006","journal-title":"Science"},{"key":"2022071906083821500_ref2","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1186\/s12915-014-0071-7","article-title":"Metagenomic analysis of double-stranded DNA viruses in healthy adults","volume":"12","author":"Wylie","year":"2014","journal-title":"BMC Biol"},{"key":"2022071906083821500_ref3","doi-asserted-by":"crossref","first-page":"8686","DOI":"10.1038\/s41598-018-26851-1","article-title":"Metagenomics detection and characterisation of viruses in faecal samples from Australian wild birds","volume":"8","author":"Vibin","year":"2018","journal-title":"Sci Rep"},{"key":"2022071906083821500_ref4","doi-asserted-by":"crossref","DOI":"10.1128\/JCM.01123-18","article-title":"Detection of viruses in clinical samples by use of metagenomic sequencing and targeted sequence capture","volume":"56","author":"Wylie","year":"2018","journal-title":"J Clin Microbiol"},{"key":"2022071906083821500_ref5","doi-asserted-by":"crossref","first-page":"2403","DOI":"10.3389\/fmicb.2019.02403","article-title":"A review on viral metagenomics in extreme environments","volume":"10","author":"Davila-Ramos","year":"2019","journal-title":"Front Microbiol"},{"key":"2022071906083821500_ref6","doi-asserted-by":"crossref","first-page":"1951","DOI":"10.3389\/fmicb.2019.01951","article-title":"Metagenomic analysis of the diversity of DNA viruses in the surface and deep sea of the South China Sea","volume":"10","author":"Liang","year":"2019","journal-title":"Front Microbiol"},{"key":"2022071906083821500_ref7","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1038\/nrmicro.2016.177","article-title":"Consensus statement: virus taxonomy in the age of metagenomics","volume":"15","author":"Simmonds","year":"2017","journal-title":"Nat Rev Microbiol"},{"key":"2022071906083821500_ref8","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1186\/s12864-016-2446-3","article-title":"ViromeScan: a new tool for metagenomic viral community profiling","volume":"17","author":"Rampelli","year":"2016","journal-title":"BMC Genom"},{"key":"2022071906083821500_ref9","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/j.virol.2017.01.005","article-title":"VirusSeeker, a computational pipeline for virus discovery and virome composition analysis","volume":"503","author":"Zhao","year":"2017","journal-title":"Virology"},{"key":"2022071906083821500_ref10","doi-asserted-by":"crossref","first-page":"28428","DOI":"10.1038\/srep28428","article-title":"Assessing viral taxonomic composition in benthic marine ecosystems: reliability and efficiency of different bioinformatic tools for viral metagenomic analyses","volume":"6","author":"Tangherlini","year":"2016","journal-title":"Sci Rep"},{"key":"2022071906083821500_ref11","doi-asserted-by":"crossref","first-page":"846","DOI":"10.1099\/00207713-44-4-846","article-title":"A place for DNA-DNA reassociation and 16S ribosomal-RNA sequence-analysis in the present species definition in bacteriology","volume":"44","author":"Stackebrandt","year":"1994","journal-title":"Int J Syst Bacteriol"},{"key":"2022071906083821500_ref12","doi-asserted-by":"crossref","first-page":"e66213","DOI":"10.1371\/journal.pone.0066213","article-title":"A DNA-based registry for all animal species: the barcode index number (BIN) system","volume":"8","author":"Ratnasingham","year":"2013","journal-title":"PLoS One"},{"key":"2022071906083821500_ref13","doi-asserted-by":"crossref","first-page":"35275","DOI":"10.1038\/srep35275","article-title":"Molecular evolution of a widely-adopted taxonomic marker (COI) across the animal tree of life","volume":"6","author":"Pentinsaari","year":"2016","journal-title":"Sci Rep"},{"key":"2022071906083821500_ref14","doi-asserted-by":"crossref","first-page":"3396","DOI":"10.1093\/bioinformatics\/btx440","article-title":"VICTOR: genome-based phylogeny and classification of prokaryotic viruses","volume":"33","author":"Meier-Kolthoff","year":"2017","journal-title":"Bioinformatics"},{"key":"2022071906083821500_ref15","doi-asserted-by":"crossref","first-page":"R46","DOI":"10.1186\/gb-2014-15-3-r46","article-title":"Kraken: ultrafast metagenomic sequence classification using exact alignments","volume":"15","author":"Wood","year":"2014","journal-title":"Genome Biol"},{"key":"2022071906083821500_ref16","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1186\/s13059-019-1891-0","article-title":"Improved metagenomic analysis with Kraken 2","volume":"20","author":"Wood","year":"2019","journal-title":"Genome Biol"},{"key":"2022071906083821500_ref17","doi-asserted-by":"crossref","first-page":"2329","DOI":"10.1093\/bioinformatics\/bth324","article-title":"Whole-genome prokaryotic phylogeny","volume":"21","author":"Henz","year":"2005","journal-title":"Bioinformatics"},{"key":"2022071906083821500_ref18","doi-asserted-by":"crossref","first-page":"e3243","DOI":"10.7717\/peerj.3243","article-title":"vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect archaea and bacteria","volume":"5","author":"Bolduc","year":"2017","journal-title":"PeerJ"},{"key":"2022071906083821500_ref19","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1186\/s12859-017-1602-3","article-title":"A machine learning approach for viral genome classification","volume":"18","author":"Remita","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"2022071906083821500_ref20","doi-asserted-by":"crossref","first-page":"81297","DOI":"10.1109\/ACCESS.2019.2923687","article-title":"Viral Genome Deep Classifier","volume":"7","author":"Fabijanska","year":"2019","journal-title":"IEEE Access"},{"key":"2022071906083821500_ref21","doi-asserted-by":"crossref","first-page":"3209","DOI":"10.1038\/s41598-021-82043-4","article-title":"PACIFIC: a lightweight deep-learning classifier of SARS-CoV-2 and co-infecting RNA viruses","volume":"11","author":"Acera Mateos","year":"2021","journal-title":"Sci Rep"},{"key":"2022071906083821500_ref22","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1186\/s40168-020-00990-y","article-title":"VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses","volume":"9","author":"Guo","year":"2021","journal-title":"Microbiome"},{"key":"2022071906083821500_ref23","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1007\/s40484-019-0187-4","article-title":"Identifying viruses from metagenomic data using deep learning","volume":"8","author":"Ren","year":"2020","journal-title":"Quant Biol"},{"key":"2022071906083821500_ref24","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1016\/j.ymeth.2020.05.018","article-title":"CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning","volume":"189","author":"Shang","year":"2021","journal-title":"Methods"},{"key":"2022071906083821500_ref25","doi-asserted-by":"crossref","first-page":"5261","DOI":"10.1128\/AEM.00062-07","article-title":"Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy","volume":"73","author":"Wang","year":"2007","journal-title":"Appl Environ Microbiol"},{"key":"2022071906083821500_ref26","doi-asserted-by":"crossref","first-page":"e64328","DOI":"10.1371\/journal.pone.0064328","article-title":"Real time classification of viruses in 12 dimensions","volume":"8","author":"Yu","year":"2013","journal-title":"PLoS One"},{"key":"2022071906083821500_ref27","doi-asserted-by":"crossref","first-page":"510","DOI":"10.1016\/j.ympev.2012.07.003","article-title":"Alignment-free distance measure based on return time distribution for sequence analysis: applications to clustering, molecular phylogeny and subtyping","volume":"65","author":"Kolekar","year":"2012","journal-title":"Mol Phylogenet Evol"},{"key":"2022071906083821500_ref28","article-title":"Attention is all you need","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS\u201917)","author":"Vaswani"},{"key":"2022071906083821500_ref29","article-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Devlin"},{"key":"2022071906083821500_ref30","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1186\/s12864-018-5370-x","article-title":"Gene2vec: distributed representation of genes based on co-expression","volume":"20","author":"Du","year":"2019","journal-title":"BMC Genom"},{"key":"2022071906083821500_ref31","doi-asserted-by":"crossref","first-page":"844","DOI":"10.1038\/s41598-020-80670-x","article-title":"A deep learning framework combined with word embedding to identify DNA replication origins","volume":"11","author":"Wu","year":"2021","journal-title":"Sci Rep"},{"key":"2022071906083821500_ref32","article-title":"Distributed representations of words and phrases and their compositionality","volume-title":"Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS\u201913)","author":"Mikolov"},{"key":"2022071906083821500_ref33","article-title":"Language models are few-shot learners","volume-title":"arXiv preprint","author":"Brown","year":"2020"},{"key":"2022071906083821500_ref34","doi-asserted-by":"crossref","first-page":"2112","DOI":"10.1093\/bioinformatics\/btab083","article-title":"DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome","volume":"37","author":"Ji","year":"2021","journal-title":"Bioinformatics"},{"key":"2022071906083821500_ref35","article-title":"RoBERTa: a robustly optimized BERT pretraining approach","volume-title":"arXiv preprint","author":"Liu","year":"2019"},{"key":"2022071906083821500_ref36","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2022071906083821500_ref37","article-title":"DWGSIM: whole genome simulator for next-generation sequencing","author":"Homer","year":"2010","journal-title":"GitHub Repository"},{"key":"2022071906083821500_ref38","doi-asserted-by":"crossref","first-page":"lqab019","DOI":"10.1093\/nargab\/lqab019","article-title":"Sequencing error profiles of Illumina sequencing instruments","volume":"3","author":"Stoler","year":"2021","journal-title":"NAR Genom Bioinform"},{"key":"2022071906083821500_ref39","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1186\/s12985-020-01482-z","article-title":"Comparison of viromes in vaginal secretion from pregnant women with and without vaginitis","volume":"18","author":"Zhang","year":"2021","journal-title":"Virol J"},{"key":"2022071906083821500_ref40","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with Bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat Methods"},{"key":"2022071906083821500_ref41","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn. Res."},{"key":"2022071906083821500_ref42","doi-asserted-by":"crossref","DOI":"10.7554\/eLife.65088","article-title":"Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3","volume":"10","author":"Beghini","year":"2021","journal-title":"Elife"},{"key":"2022071906083821500_ref43","doi-asserted-by":"crossref","first-page":"1956","DOI":"10.1038\/s41396-021-00897-y","article-title":"Viromes outperform total metagenomes in revealing the spatiotemporal patterns of agricultural soil viral communities","volume":"15","author":"Santos-Medellin","year":"2021","journal-title":"ISME J"},{"key":"2022071906083821500_ref44","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1016\/j.virol.2018.01.002","article-title":"Classification and evolution of human papillomavirus genome variants: Alpha-5 (HPV26, 51, 69, 82), Alpha-6 (HPV30, 53, 56, 66), Alpha-11 (HPV34, 73), Alpha-13 (HPV54) and Alpha-3 (HPV61)","volume":"516","author":"Chen","year":"2018","journal-title":"Virology"},{"key":"2022071906083821500_ref45","doi-asserted-by":"crossref","first-page":"D593","DOI":"10.1093\/nar\/gkr859","article-title":"ViPR: an open bioinformatics database and analysis resource for virology research","volume":"40","author":"Pickett","year":"2012","journal-title":"Nucleic Acids Res"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/4\/bbac204\/45016566\/bbac204.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/4\/bbac204\/45016566\/bbac204.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,7,19]],"date-time":"2022-07-19T06:09:29Z","timestamp":1658210969000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbac204\/6603436"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,4]]},"references-count":45,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,7,18]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbac204","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,7,18]]},"published":{"date-parts":[[2022,6,4]]},"article-number":"bbac204"}}