{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T19:10:04Z","timestamp":1780773004512,"version":"3.54.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2025,2,8]],"date-time":"2025-02-08T00:00:00Z","timestamp":1738972800000},"content-version":"vor","delay-in-days":78,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62173282"],"award-info":[{"award-number":["62173282"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62472363"],"award-info":[{"award-number":["62472363"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Fuzhou Inter-institutional Science and Technology Cooperation Project","award":["2024-Y-018"],"award-info":[{"award-number":["2024-Y-018"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,11,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Viruses exert a profound influence on both human health and the global ecosystem, yet they remain largely unexplored. Precise taxonomic classification of viral sequences is essential for discovering novel viruses, elucidating their functions, and assessing their implications for public health and environmental monitoring. Traditional taxonomy methods based on genome references are limited by the vast number of unexplored viruses, rapid mutation rates, and high genetic diversity. Additionally, highly imbalanced species distribution and significant variances in inter-species genomic distances across taxonomic units pose challenges to classifier training. Conceptualizing genomic sequences as sentences in a natural language, large language models provide novel approaches for extracting intrinsic viral genome characteristics. In this study, we introduce ViTax, a virus taxonomy classification tool powered by HyenaDNA, a large language foundation model for long-range genomic sequences at single nucleotide resolution. ViTax integrates supervised prototypical contrastive learning to address the highly imbalanced distributions across various taxonomic clades and demonstrates superior performance to current leading methods in virus taxonomy, particularly significant for long sequences. Moreover, ViTax designs a belief mapping tree using the Lowest Common Ancestor algorithm to adaptively assign a sequence to the lowest taxonomy clade with confidence. For the open-set problem, where sequences belong to novel and unexplored genera, ViTax can adaptively assign them to a higher level of known taxonomy with outstanding performance. These capabilities make ViTax a robust tool for advancing the accuracy and reliability of viral taxonomy classification. The code is available at https:\/\/github.com\/Ying-Lab\/ViTax.<\/jats:p>","DOI":"10.1093\/bib\/bbaf041","type":"journal-article","created":{"date-parts":[[2025,1,20]],"date-time":"2025-01-20T20:18:19Z","timestamp":1737404299000},"source":"Crossref","is-referenced-by-count":4,"title":["ViTax: adaptive hierarchical viral taxonomy classification with a taxonomy belief tree on a foundation model"],"prefix":"10.1093","volume":"26","author":[{"given":"YuShuang","family":"He","sequence":"first","affiliation":[{"name":"Department of Automation, Xiamen University , Xiamen, Fujian 361005,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Feng","family":"Zhou","sequence":"additional","affiliation":[{"name":"Department of Automation, Xiamen University , Xiamen, Fujian 361005,","place":["China"]},{"name":"National Institute for Data Science in Health and Medicine, Xiamen University , Xiamen, Fujian 361005,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"JiaXing","family":"Bai","sequence":"additional","affiliation":[{"name":"Department of Automation, Xiamen University , Xiamen, Fujian 361005,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"YiChun","family":"Gao","sequence":"additional","affiliation":[{"name":"Department of Automation, Xiamen University , Xiamen, Fujian 361005,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiaobing","family":"Huang","sequence":"additional","affiliation":[{"name":"Department of Medical Oncology, Fuzhou First Hospital Affiliated with Fujian Medical University , Fuzhou, Fujian 350108,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ying","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Automation, Xiamen University , Xiamen, Fujian 361005,","place":["China"]},{"name":"National Institute for Data Science in Health and Medicine, Xiamen University , Xiamen, Fujian 361005,","place":["China"]},{"name":"State Key Laboratory of Mariculture Breeding, Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision , Xiamen University, Xiamen, Fujian 350108,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2025,2,8]]},"reference":[{"key":"2025020806461622900_ref1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12951-021-01081-2","article-title":"Advances and insights in the diagnosis of viral infections","volume":"19","author":"Dronina","year":"2021","journal-title":"J Nanobiotechnol"},{"key":"2025020806461622900_ref2","doi-asserted-by":"publisher","first-page":"801","DOI":"10.1038\/nrmicro1750","article-title":"Marine viruses\u2014major players in the global ecosystem","volume":"5","author":"Suttle","year":"2007","journal-title":"Nat Rev Microbiol"},{"key":"2025020806461622900_ref3","doi-asserted-by":"publisher","first-page":"278","DOI":"10.1016\/j.tim.2005.04.003","article-title":"Here a virus, there a virus, everywhere the same virus?","volume":"13","author":"Breitbart","year":"2005","journal-title":"Trends Microbiol"},{"key":"2025020806461622900_ref4","doi-asserted-by":"publisher","first-page":"001840","DOI":"10.1099\/jgv.0.001840","article-title":"Virus taxonomy and the role of the international committee on taxonomy of viruses (ictv)","volume":"104","author":"Siddell","year":"2023","journal-title":"J Gen Virol"},{"key":"2025020806461622900_ref5","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1007\/s00705-023-05797-4","article-title":"Changes to virus taxonomy and the ICTV statutes ratified by the International Committee on Taxonomy of Viruses (2023)","volume":"168","author":"Zerbini","year":"2023","journal-title":"Arch Virol"},{"key":"2025020806461622900_ref6","doi-asserted-by":"crossref","first-page":"3614","DOI":"10.1109\/TPAMI.2020.2981604","article-title":"Recent advances in open set recognition: a survey","volume":"43","author":"Geng","year":"2020","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2025020806461622900_ref7","article-title":"HyenaDNA: long-range genomic sequence modeling at single nucleotide resolution","volume":"36","author":"Nguyen","year":"2024","journal-title":"Adv Neural Inf Process Syst"},{"key":"2025020806461622900_ref8","first-page":"253","article-title":"On finding lowest common ancestors in trees","volume-title":"Proceedings of the fifth annual ACM symposium on Theory of computing","author":"Aho","year":"1973"},{"key":"2025020806461622900_ref9","first-page":"28043","article-title":"Hyena hierarchy: towards larger convolutional language models","volume-title":"International Conference on Machine Learning","author":"Poli","year":"2023"},{"key":"2025020806461622900_ref10","doi-asserted-by":"publisher","first-page":"W6","DOI":"10.1093\/nar\/gkl164","article-title":"BLAST: improvements for better sequence analysis","volume":"34","author":"Ye","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2025020806461622900_ref11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13059-019-1891-0","article-title":"Improved metagenomic analysis with Kraken 2","volume":"20","author":"Wood","year":"2019","journal-title":"Genome Biol"},{"key":"2025020806461622900_ref12","doi-asserted-by":"publisher","first-page":"110414","DOI":"10.1016\/j.ygeno.2022.110414","article-title":"VirusTaxo: taxonomic classification of viruses from the genome sequence using k-mer enrichment","volume":"114","author":"Raju","year":"2022","journal-title":"Genomics"},{"key":"2025020806461622900_ref13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12864-015-1419-2","article-title":"CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers","volume":"16","author":"Ounit","year":"2015","journal-title":"BMC Genomics"},{"key":"2025020806461622900_ref14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13059-019-1817-x","article-title":"Robust taxonomic classification of uncharted microbial sequences and bins with cat and bat","volume":"20","author":"Meijenfeldt","year":"2019","journal-title":"Genome Biol"},{"key":"2025020806461622900_ref15","doi-asserted-by":"publisher","first-page":"bbac505","DOI":"10.1093\/bib\/bbac505","article-title":"Virus classification for viral genomic fragments using phagcn2","volume":"24","author":"Jiang","year":"2023","journal-title":"Brief Bioinform"},{"key":"2025020806461622900_ref16","doi-asserted-by":"publisher","first-page":"i25","DOI":"10.1093\/bioinformatics\/btab293","article-title":"Bacteriophage classification for assembled contigs using graph convolutional network","volume":"37","author":"Shang","year":"2021","journal-title":"Bioinformatics"},{"key":"2025020806461622900_ref17","doi-asserted-by":"publisher","first-page":"bbad408","DOI":"10.1093\/bib\/bbad408","article-title":"PhaGenus: genus-level classification of bacteriophages using a transformer model","volume":"24","author":"Guan","year":"2023","journal-title":"Brief Bioinform"},{"key":"2025020806461622900_ref18","doi-asserted-by":"publisher","first-page":"e3243","DOI":"10.7717\/peerj.3243","article-title":"vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect archaea and bacteria","volume":"5","author":"Bolduc","year":"2017","journal-title":"PeerJ"},{"key":"2025020806461622900_ref19","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1007\/s00705-022-05694-2","article-title":"Abolishment of morphology-based taxa and change to binomial species names: 2022 taxonomy update of the ICTV Bacterial Viruses Subcommittee","volume":"168","author":"Turner","year":"2023","journal-title":"Arch Virol"},{"key":"2025020806461622900_ref20","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv Neural Inf Process Syst"},{"key":"2025020806461622900_ref21","doi-asserted-by":"publisher","first-page":"2112","DOI":"10.1093\/bioinformatics\/btab083","article-title":"DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome","volume":"37","author":"Ji","year":"2021","journal-title":"Bioinformatics"},{"key":"2025020806461622900_ref22","article-title":"DNABERT-2: efficient foundation model and benchmark for multi-species genome","volume-title":"International Conference on Learning Representations","author":"Zhou"},{"key":"2025020806461622900_ref23","article-title":"Prototypical contrastive learning of unsupervised representations","volume-title":"Proceedings of the Ninth International Conference on Learning Representations","author":"Li"},{"key":"2025020806461622900_ref24","volume-title":"The EM Algorithm and Extensions","author":"McLachlan"},{"key":"2025020806461622900_ref25","article-title":"Understanding contrastive learning via distributionally robust optimization","volume":"36","author":"Junkang","year":"2024","journal-title":"Adv Neural Inf Process Syst"},{"key":"2025020806461622900_ref26","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1348\/000711005X48266","article-title":"K-means clustering: a half-century synthesis","volume":"59","author":"Steinley","year":"2006","journal-title":"Br J Math Stat Psychol"},{"key":"2025020806461622900_ref27","doi-asserted-by":"publisher","first-page":"1109","DOI":"10.1016\/j.cell.2019.03.040","article-title":"Marine DNA viral macro-and microdiversity from pole to pole","volume":"177","author":"Gregory","year":"2019","journal-title":"Cell"},{"key":"2025020806461622900_ref28","doi-asserted-by":"publisher","first-page":"717","DOI":"10.1038\/s41559-024-02347-2","article-title":"Biogeographic patterns and drivers of soil viromes","volume":"8","author":"Ma","year":"2024","journal-title":"Nat Ecol Evol"},{"key":"2025020806461622900_ref29","doi-asserted-by":"publisher","first-page":"1098","DOI":"10.1016\/j.cell.2021.01.029","article-title":"Massive expansion of human gut bacteriophage diversity","volume":"184","author":"Camarillo-Guerrero","year":"2021","journal-title":"Cell"},{"key":"2025020806461622900_ref30","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1007\/s00705-023-05906-3","article-title":"Taxonomic update for giant viruses in the order imitervirales (phylum nucleocytoviricota)","volume":"168","author":"Aylward","year":"2023","journal-title":"Arch Virol"},{"key":"2025020806461622900_ref31","doi-asserted-by":"publisher","first-page":"4576","DOI":"10.1111\/1462-2920.15651","article-title":"Dynamics of Baltic Sea phages driven by environmental changes","volume":"23","author":"Hoetzinger","year":"2021","journal-title":"Environ Microbiol"},{"key":"2025020806461622900_ref32","doi-asserted-by":"publisher","first-page":"106","DOI":"10.1128\/MMBR.63.1.106-127.1999","article-title":"Prochlorococcus, a marine photosynthetic prokaryote of global significance","volume":"63","author":"Partensky","year":"1999","journal-title":"Microbiol Mol Biol Rev"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/1\/bbaf041\/61807762\/bbaf041.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/1\/bbaf041\/61807762\/bbaf041.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,8]],"date-time":"2025-02-08T06:46:38Z","timestamp":1738997198000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf041\/8005516"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,22]]},"references-count":32,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,11,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf041","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,1]]},"published":{"date-parts":[[2024,11,22]]},"article-number":"bbaf041"}}