{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T15:24:18Z","timestamp":1781018658424,"version":"3.54.1"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T00:00:00Z","timestamp":1748304000000},"content-version":"vor","delay-in-days":26,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"Capital Health Research and Development","award":["2024-1G-4421"],"award-info":[{"award-number":["2024-1G-4421"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Viruses are ubiquitous in nature, yet our understanding of them remains limited. High-throughput sequencing technology facilitates the unbiased revelation of genetic composition in samples; however, viral sequences typically make up a small proportion of the entire sequencing data, making it challenging to accurately identify the few or fragmented viral sequences present in a sample. The limited features and information provided by short sequences result in insufficient resolution of viral sequences by existing models. Therefore, we propose a new model, VirNucPro, for short viral sequence identification. Based on a six-frame translation strategy and large language models, we combine nucleotide and amino acid sequence information to enhance feature extraction for short sequences, achieving high accuracy in identifying short viral sequences. Ablation experiments compared the contributions of nucleotide and amino acid sequence features to the model, confirming that the introduced amino acid features significantly contribute to the classification results. Our model outperforms others, such as GCNFrame, DeepVirFinder, DETIRE, and Virtifier, which have demonstrated good performance in identifying short viral sequences of 300 and 500\u00a0bp. Our model demonstrates excellent performance on carefully created real-world datasets. Additionally, it can scan for prophage regions within long bacterial fragments, offering a wide range of applications. The codes are available at: https:\/\/github.com\/Li-Jing-1997\/VirNucPro.<\/jats:p>","DOI":"10.1093\/bib\/bbaf224","type":"journal-article","created":{"date-parts":[[2025,5,19]],"date-time":"2025-05-19T12:34:47Z","timestamp":1747658087000},"source":"Crossref","is-referenced-by-count":3,"title":["VirNucPro: an identifier for the identification of viral short sequences using six-frame translation and large language models"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3060-2531","authenticated-orcid":false,"given":"Jing","family":"Li","sequence":"first","affiliation":[{"name":"The College of Information Science and Technology, Beijing University of Chemical Technology , No. 15 North Third Ring East Road, Chaoyang District, Beijing 100029 ,","place":["China"]},{"name":"The College of Life Science and Technology, Beijing University of Chemical Technology , No. 15 North Third Ring East Road, Chaoyang District, Beijing 100029 ,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9393-4288","authenticated-orcid":false,"given":"Jia","family":"Mi","sequence":"additional","affiliation":[{"name":"The College of Information Science and Technology, Beijing University of Chemical Technology , No. 15 North Third Ring East Road, Chaoyang District, Beijing 100029 ,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Wei","family":"Lin","sequence":"additional","affiliation":[{"name":"The College of Life Science and Technology, Beijing University of Chemical Technology , No. 15 North Third Ring East Road, Chaoyang District, Beijing 100029 ,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fengjuan","family":"Tian","sequence":"additional","affiliation":[{"name":"The College of Life Science and Technology, Beijing University of Chemical Technology , No. 15 North Third Ring East Road, Chaoyang District, Beijing 100029 ,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jing","family":"Wan","sequence":"additional","affiliation":[{"name":"The College of Information Science and Technology, Beijing University of Chemical Technology , No. 15 North Third Ring East Road, Chaoyang District, Beijing 100029 ,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jingyang","family":"Gao","sequence":"additional","affiliation":[{"name":"The College of Information Science and Technology, Beijing University of Chemical Technology , No. 15 North Third Ring East Road, Chaoyang District, Beijing 100029 ,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yigang","family":"Tong","sequence":"additional","affiliation":[{"name":"The College of Life Science and Technology, Beijing University of Chemical Technology , No. 15 North Third Ring East Road, Chaoyang District, Beijing 100029 ,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2025,5,19]]},"reference":[{"key":"2025052712073215800_ref1","doi-asserted-by":"publisher","first-page":"5128","DOI":"10.1016\/j.cell.2024.07.025","article-title":"Virology-the next fifty years","volume":"187","author":"Holmes","year":"2024","journal-title":"Cell"},{"key":"2025052712073215800_ref2","doi-asserted-by":"publisher","first-page":"1168","DOI":"10.1016\/j.cell.2018.02.043","article-title":"Using metagenomics to characterize an expanding Virosphere","volume":"172","author":"Zhang","year":"2018","journal-title":"Cell"},{"key":"2025052712073215800_ref3","doi-asserted-by":"publisher","first-page":"977","DOI":"10.24272\/j.issn.2095-8137.2022.246","article-title":"Virome in healthy pangolins reveals compatibility with multiple potentially zoonotic viruses","volume":"43","author":"Tian","year":"2022","journal-title":"Zool Res"},{"key":"2025052712073215800_ref4","doi-asserted-by":"publisher","first-page":"58","DOI":"10.1186\/s40168-021-01015-y","article-title":"Accurate and sensitive detection of microbial eukaryotes from whole metagenome shotgun sequencing","volume":"9","author":"Lind","year":"2021","journal-title":"Microbiome"},{"key":"2025052712073215800_ref5","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1186\/s40168-021-01017-w","article-title":"Thousands of previously unknown phages discovered in whole-community human gut metagenomes","volume":"9","author":"Benler","year":"2021","journal-title":"Microbiome"},{"key":"2025052712073215800_ref6","doi-asserted-by":"publisher","first-page":"172829","DOI":"10.1016\/j.scitotenv.2024.172829","article-title":"Viral communities locked in high elevation permafrost up to 100 m in depth on the Tibetan plateau","volume":"932","author":"Wen","year":"2024","journal-title":"Sci Total Environ"},{"key":"2025052712073215800_ref7","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1186\/s13059-024-03236-4","article-title":"Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes","volume":"25","author":"Wu","year":"2024","journal-title":"Genome Biol"},{"key":"2025052712073215800_ref8","doi-asserted-by":"publisher","first-page":"104908","DOI":"10.1016\/j.jcv.2021.104908","article-title":"Benchmark of thirteen bioinformatic pipelines for metagenomic virus diagnostics using datasets from clinical samples","volume":"141","author":"Vries","year":"2021","journal-title":"J Clin Virol"},{"key":"2025052712073215800_ref9","doi-asserted-by":"publisher","first-page":"471","DOI":"10.2478\/jvetres-2019-0067","article-title":"Evaluation of direct metagenomics and target enriched approaches for high-throughput sequencing of field rabies viruses","volume":"63","author":"Or\u0142owska","year":"2019","journal-title":"J Vet Res"},{"key":"2025052712073215800_ref10","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1186\/s40168-023-01533-x","article-title":"Gauge your phage: Benchmarking of bacteriophage identification tools in metagenomic sequencing data","volume":"11","author":"Ho","year":"2023","journal-title":"Microbiome"},{"key":"2025052712073215800_ref11","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbae036","article-title":"VirGrapher: A graph-based viral identifier for long sequences from metagenomes","volume":"25","author":"Miao","year":"2024","journal-title":"Brief Bioinform"},{"key":"2025052712073215800_ref12","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1186\/s40168-020-00990-y","article-title":"VirSorter2: A multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses","volume":"9","author":"Guo","year":"2021","journal-title":"Microbiome"},{"key":"2025052712073215800_ref13","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1186\/s40168-020-00867-0","article-title":"VIBRANT: Automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences","volume":"8","author":"Kieft","year":"2020","journal-title":"Microbiome"},{"key":"2025052712073215800_ref14","doi-asserted-by":"publisher","first-page":"e0110523","DOI":"10.1128\/msystems.01105-23","article-title":"Benchmarking informatics approaches for virus discovery: Caution is needed when combining in silico identification methods","volume":"9","author":"Hegarty","year":"2024","journal-title":"mSystems"},{"key":"2025052712073215800_ref15","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1007\/s40484-019-0187-4","article-title":"Identifying viruses from metagenomic data using deep learning","volume":"8","author":"Ren","year":"2020","journal-title":"Quant Biol"},{"key":"2025052712073215800_ref16","doi-asserted-by":"publisher","first-page":"1216","DOI":"10.1093\/bioinformatics\/btab845","article-title":"Virtifier: A deep learning-based identifier for viral sequences from metagenomes","volume":"38","author":"Miao","year":"2022","journal-title":"Bioinformatics"},{"key":"2025052712073215800_ref17","doi-asserted-by":"publisher","first-page":"1169791","DOI":"10.3389\/fmicb.2023.1169791","article-title":"DETIRE: A hybrid deep learning model for identifying viral sequences from metagenomes","volume":"14","author":"Miao","year":"2023","journal-title":"Front Microbiol"},{"key":"2025052712073215800_ref18","doi-asserted-by":"publisher","first-page":"6929","DOI":"10.1016\/j.cell.2024.09.027","article-title":"Using artificial intelligence to document the hidden RNA virosphere","volume":"187","author":"Hou","year":"2024","journal-title":"Cell"},{"key":"2025052712073215800_ref19","article-title":"DNABERT-S: Pioneering species differentiation with species-aware DNA Embeddings","author":"Zhou","year":"2024","journal-title":"arXiv"},{"key":"2025052712073215800_ref20","doi-asserted-by":"publisher","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2025052712073215800_ref21","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btae188","article-title":"Coding genomes with gapped pattern graph convolutional network","volume":"40","author":"Wang","year":"2024","journal-title":"Bioinformatics"},{"key":"2025052712073215800_ref22","doi-asserted-by":"publisher","first-page":"D772","DOI":"10.1093\/nar\/gkae1007","article-title":"GutMetaNet: An integrated database for exploring horizontal gene transfer and functional redundancy in the human gut microbiome","volume":"53","author":"Jiang","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025052712073215800_ref23","doi-asserted-by":"publisher","first-page":"2063","DOI":"10.1016\/j.chom.2024.10.012","article-title":"A prophage competition element protects salmonella from lysis","volume":"32","author":"Sargen","year":"2024","journal-title":"Cell Host Microbe"},{"key":"2025052712073215800_ref24","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btae704","article-title":"ViraLM: Empowering virus discovery through the genome foundation model","volume":"40","author":"Peng","year":"2024","journal-title":"Bioinformatics"},{"key":"2025052712073215800_ref25","doi-asserted-by":"publisher","first-page":"198569","DOI":"10.1016\/j.virusres.2021.198569","article-title":"Characteristics and genome analysis of a novel bacteriophage IME1323_01, the first temperate bacteriophage induced from Staphylococcus caprae","volume":"305","author":"Tian","year":"2021","journal-title":"Virus Res"},{"key":"2025052712073215800_ref26","doi-asserted-by":"publisher","first-page":"198812","DOI":"10.1016\/j.virusres.2022.198812","article-title":"Molecular dissection of the first Staphylococcus cohnii temperate phage IME1354_01","volume":"318","author":"Tian","year":"2022","journal-title":"Virus Res"},{"key":"2025052712073215800_ref27","doi-asserted-by":"publisher","first-page":"2083","DOI":"10.1101\/gr.218255.116","article-title":"An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics","volume":"27","author":"Omasits","year":"2017","journal-title":"Genome Res"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/3\/bbaf224\/63233428\/bbaf224.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/3\/bbaf224\/63233428\/bbaf224.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T16:07:39Z","timestamp":1748362059000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf224\/8137681"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,1]]},"references-count":27,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf224","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,5]]},"published":{"date-parts":[[2025,5,1]]},"article-number":"bbaf224"}}