{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T06:36:45Z","timestamp":1776321405198,"version":"3.50.1"},"reference-count":16,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2024,11,23]],"date-time":"2024-11-23T00:00:00Z","timestamp":1732320000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Hong Kong Research Grants Council","award":["11209823"],"award-info":[{"award-number":["11209823"]}]},{"DOI":"10.13039\/501100007156","name":"Hong Kong Innovation and Technology Fund","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100007156","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,11,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Viruses, with their ubiquitous presence and high diversity, play pivotal roles in ecological systems and public health. Accurate identification of viruses in various ecosystems is essential for comprehending their variety and assessing their ecological influence. Metagenomic sequencing has become a major strategy to survey the viruses in various ecosystems. However, accurate and comprehensive virus detection in metagenomic data remains difficult. Limited reference sequences prevent alignment-based methods from identifying novel viruses. Machine learning-based tools are more promising in novel virus detection but often miss short viral contigs, which are abundant in typical metagenomic data. The inconsistency in virus search results produced by available tools further highlights the urgent need for a more robust tool for virus identification.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>In this work, we develop ViraLM for identifying novel viral contigs in metagenomic data. By using the latest genome foundation model as the backbone and training on a rigorously constructed dataset, the model is able to distinguish viruses from other organisms based on the learned genomic characteristics. We thoroughly tested ViraLM on multiple datasets and the experimental results show that ViraLM outperforms available tools in different scenarios. In particular, ViraLM improves the F1-score on short contigs by 22%.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The source code of ViraLM is available via: https:\/\/github.com\/ChengPENG-wolf\/ViraLM.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae704","type":"journal-article","created":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T07:28:18Z","timestamp":1732001298000},"source":"Crossref","is-referenced-by-count":20,"title":["ViraLM: empowering virus discovery through the genome foundation model"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8566-0707","authenticated-orcid":false,"given":"Cheng","family":"Peng","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering, City University of Hong Kong , Hong Kong (SAR),","place":["China"]}]},{"given":"Jiayu","family":"Shang","sequence":"additional","affiliation":[{"name":"Department of Information Engineering, The Chinese University of Hong Kong , Hong Kong (SAR),","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-9200-4862","authenticated-orcid":false,"given":"Jiaojiao","family":"Guan","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, City University of Hong Kong , Hong Kong (SAR),","place":["China"]}]},{"given":"Donglin","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Environmental Science and Engineering, Shandong University , Qingdao 266200,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1373-8023","authenticated-orcid":false,"given":"Yanni","family":"Sun","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, City University of Hong Kong , Hong Kong (SAR),","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2024,11,23]]},"reference":[{"key":"2024121100225074400_btae704-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2024121100225074400_btae704-B2","first-page":"1","article-title":"Identification of mobile genetic elements with genomad","volume":"42","author":"Camargo","year":"2023","journal-title":"Nat Biotechnol"},{"key":"2024121100225074400_btae704-B3","author":"Dalla-Torre","year":"2023"},{"key":"2024121100225074400_btae704-B4","author":"Devlin","year":"2018"},{"key":"2024121100225074400_btae704-B5","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1186\/s13059-024-03320-9","article-title":"Virrep: a hybrid language representation learning framework for identifying viruses from human gut metagenomes","volume":"25","author":"Dong","year":"2024","journal-title":"Genome Biol"},{"key":"2024121100225074400_btae704-B6","doi-asserted-by":"crossref","first-page":"916","DOI":"10.3390\/life12060916","article-title":"Microbial community composition of the antarctic ecosystems: review of the bacteria, fungi, and archaea identified through an NGS-based metagenomics approach","volume":"12","author":"Doytchinov","year":"2022","journal-title":"Life"},{"key":"2024121100225074400_btae704-B7","doi-asserted-by":"crossref","first-page":"3989","DOI":"10.1016\/j.vaccine.2020.04.011","article-title":"Arboviruses: a global public health threat","volume":"38","author":"Girard","year":"2020","journal-title":"Vaccine"},{"key":"2024121100225074400_btae704-B8","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1186\/s40168-020-00990-y","article-title":"Virsorter2: a multi-classifier, expert-guided approach to detect diverse dna and rna viruses","volume":"9","author":"Guo","year":"2021","journal-title":"Microbiome"},{"key":"2024121100225074400_btae704-B9","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1186\/s40168-020-00867-0","article-title":"Vibrant: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences","volume":"8","author":"Kieft","year":"2020","journal-title":"Microbiome"},{"key":"2024121100225074400_btae704-B10","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1111\/1462-2920.16207","article-title":"The global virome: How much diversity and how many independent origins?","volume":"25","author":"Koonin","year":"2023","journal-title":"Environ Microbiol"},{"key":"2024121100225074400_btae704-B11","doi-asserted-by":"crossref","first-page":"e1011422","DOI":"10.1371\/journal.pcbi.1011422","article-title":"Virify: an integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models","volume":"19","author":"Rangel-Pineros","year":"2023","journal-title":"PLoS Comput Biol"},{"key":"2024121100225074400_btae704-B12","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1007\/s40484-019-0187-4","article-title":"Identifying viruses from metagenomic data using deep learning","volume":"8","author":"Ren","year":"2020","journal-title":"Quant Biol"},{"key":"2024121100225074400_btae704-B13","doi-asserted-by":"crossref","first-page":"636","DOI":"10.1186\/s12864-023-09737-z","article-title":"Untangling an insect\u2019s virome from its endogenous viral elements","volume":"24","author":"Rozo-Lopez","year":"2023","journal-title":"BMC Genomics"},{"key":"2024121100225074400_btae704-B14","doi-asserted-by":"crossref","first-page":"bbac182","DOI":"10.1093\/bib\/bbac182","article-title":"Cherry: a computational method for accurate prediction of virus\u2013prokaryotic interactions using a graph encoder\u2013decoder model","volume":"23","author":"Shang","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024121100225074400_btae704-B15","doi-asserted-by":"crossref","first-page":"2080","DOI":"10.1080\/22221751.2022.2109516","article-title":"Total rna sequencing of phlebotomus chinensis sandflies in China revealed viral, bacterial, and eukaryotic microbes potentially pathogenic to humans","volume":"11","author":"Wang","year":"2022","journal-title":"Emerg Microbes Infect"},{"key":"2024121100225074400_btae704-B16","author":"Zhou","year":"2023"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae704\/60797056\/btae704.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/12\/btae704\/61007848\/btae704.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/12\/btae704\/61007848\/btae704.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,10]],"date-time":"2024-12-10T20:05:09Z","timestamp":1733861109000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae704\/7907584"}},"subtitle":[],"editor":[{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,11,23]]},"references-count":16,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,11,28]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae704","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.01.30.577935","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,12]]},"published":{"date-parts":[[2024,11,23]]},"article-number":"btae704"}}