{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T22:59:00Z","timestamp":1776466740297,"version":"3.51.2"},"reference-count":56,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T00:00:00Z","timestamp":1688083200000},"content-version":"vor","delay-in-days":29,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100007567","name":"City University of Hong Kong","doi-asserted-by":"publisher","award":["9678241"],"award-info":[{"award-number":["9678241"]}],"id":[{"id":"10.13039\/100007567","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007567","name":"City University of Hong Kong","doi-asserted-by":"publisher","award":["9667256"],"award-info":[{"award-number":["9667256"]}],"id":[{"id":"10.13039\/100007567","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007567","name":"City University of Hong Kong","doi-asserted-by":"publisher","award":["7005453"],"award-info":[{"award-number":["7005453"]}],"id":[{"id":"10.13039\/100007567","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Hong Kong Innovation and Technology Commission"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,6,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>As viruses that mainly infect bacteria, phages are key players across a wide range of ecosystems. Analyzing phage proteins is indispensable for understanding phages\u2019 functions and roles in microbiomes. High-throughput sequencing enables us to obtain phages in different microbiomes with low cost. However, compared to the fast accumulation of newly identified phages, phage protein classification remains difficult. In particular, a fundamental need is to annotate virion proteins, the structural proteins, such as major tail, baseplate, etc. Although there are experimental methods for virion protein identification, they are too expensive or time-consuming, leaving a large number of proteins unclassified. Thus, there is a great demand to develop a computational method for fast and accurate phage virion protein (PVP) classification.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this work, we adapted the state-of-the-art image classification model, Vision Transformer, to conduct virion protein classification. By encoding protein sequences into unique images using chaos game representation, we can leverage Vision Transformer to learn both local and global features from sequence \u201cimages\u201d. Our method, PhaVIP, has two main functions: classifying PVP and non-PVP sequences and annotating the types of PVP, such as capsid and tail. We tested PhaVIP on several datasets with increasing difficulty and benchmarked it against alternative tools. The experimental results show that PhaVIP has superior performance. After validating the performance of PhaVIP, we investigated two applications that can use the output of PhaVIP: phage taxonomy classification and phage host prediction. The results showed the benefit of using classified proteins over all proteins.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The web server of PhaVIP is available via: https:\/\/phage.ee.cityu.edu.hk\/phavip. The source code of PhaVIP is available via: https:\/\/github.com\/KennthShang\/PhaVIP.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad229","type":"journal-article","created":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T08:15:20Z","timestamp":1688112920000},"page":"i30-i39","source":"Crossref","is-referenced-by-count":23,"title":["PhaVIP: Phage VIrion Protein classification based on chaos game representation and Vision Transformer"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5974-4985","authenticated-orcid":false,"given":"Jiayu","family":"Shang","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering, City University of Hong Kong , Hong Kong (SAR), China"}]},{"given":"Cheng","family":"Peng","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, City University of Hong Kong , Hong Kong (SAR), China"}]},{"given":"Xubo","family":"Tang","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, City University of Hong Kong , Hong Kong (SAR), China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1373-8023","authenticated-orcid":false,"given":"Yanni","family":"Sun","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, City University of Hong Kong , Hong Kong (SAR), China"}]}],"member":"286","published-online":{"date-parts":[[2023,6,30]]},"reference":[{"key":"2023063008143891500_btad229-B1","doi-asserted-by":"crossref","first-page":"172","DOI":"10.3390\/v3030172","article-title":"Bacteriophage assembly","volume":"3","author":"Aksyuk","year":"2011","journal-title":"Viruses"},{"key":"2023063008143891500_btad229-B2","doi-asserted-by":"crossref","first-page":"1565","DOI":"10.1016\/j.ygeno.2019.09.006","article-title":"Pred-BVP-Unb: fast prediction of bacteriophage virion proteins using un-biased multi-perspective properties with recursive feature elimination","volume":"112","author":"Arif","year":"2020","journal-title":"Genomics"},{"key":"2023063008143891500_btad229-B3","doi-asserted-by":"crossref","first-page":"2943","DOI":"10.2147\/IDR.S218638","article-title":"Phage therapy as a renewed therapeutic approach to mycobacterial infections: a comprehensive review","volume":"12","author":"Azimi","year":"2019","journal-title":"Infect Drug Resist"},{"key":"2023063008143891500_btad229-B4","author":"Baevski","year":"2018"},{"key":"2023063008143891500_btad229-B5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-021-81063-4","article-title":"Predicting bacteriophage hosts based on sequences of annotated receptor-binding proteins","volume":"11","author":"Boeckaerts","year":"2021","journal-title":"Sci Rep"},{"key":"2023063008143891500_btad229-B6","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1046\/j.1365-2958.2001.02228.x","article-title":"Comparative phage genomics and the evolution of siphoviridae: insights from dairy phages","volume":"39","author":"Br\u00fcssow","year":"2001","journal-title":"Mol Microbiol"},{"key":"2023063008143891500_btad229-B7","doi-asserted-by":"crossref","first-page":"e1007845","DOI":"10.1371\/journal.pcbi.1007845","article-title":"PhANNs, a fast and accurate tool and web server to classify phage structural proteins","volume":"16","author":"Cantu","year":"2020","journal-title":"PLoS Comput Biol"},{"key":"2023063008143891500_btad229-B8","doi-asserted-by":"crossref","first-page":"353","DOI":"10.3390\/cells9020353","article-title":"PVPred-SCM: improved prediction and analysis of phage virion proteins using a scoring card method","volume":"9","author":"Charoenkwan","year":"2020","journal-title":"Cells"},{"key":"2023063008143891500_btad229-B9","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1007\/s10822-020-00323-z","article-title":"Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation","volume":"34","author":"Charoenkwan","year":"2020","journal-title":"J Comput Aided Mol Des"},{"key":"2023063008143891500_btad229-B10","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1146\/annurev-virology-100114-054952","article-title":"Viruses as winners in the game of life","volume":"3","author":"Cobi\u00e1n G\u00fcemes","year":"2016","journal-title":"Annu Rev Virol"},{"key":"2023063008143891500_btad229-B11","author":"Devlin","year":"2018"},{"key":"2023063008143891500_btad229-B12","first-page":"115","author":"Dick","year":"2020"},{"key":"2023063008143891500_btad229-B13","doi-asserted-by":"crossref","first-page":"2229","DOI":"10.1039\/C4MB00316K","article-title":"Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis","volume":"10","author":"Ding","year":"2014","journal-title":"Mol Biosyst"},{"key":"2023063008143891500_btad229-B14","author":"Dosovitskiy","year":"2020"},{"key":"2023063008143891500_btad229-B15","doi-asserted-by":"crossref","first-page":"1249","DOI":"10.1038\/s41564-019-0511-9","article-title":"Towards a genome-based virus taxonomy","volume":"4","author":"Eloe-Fadrosh","year":"2019","journal-title":"Nat Microbiol"},{"key":"2023063008143891500_btad229-B16","doi-asserted-by":"crossref","first-page":"6309","DOI":"10.1128\/AEM.01212-12","article-title":"Dynamic viral populations in hypersaline systems as revealed by metagenomic assembly","volume":"78","author":"Emerson","year":"2012","journal-title":"Appl Environ Microbiol"},{"key":"2023063008143891500_btad229-B17","doi-asserted-by":"crossref","first-page":"giac076","DOI":"10.1093\/gigascience\/giac076","article-title":"DeePVP: identification and classification of phage virion proteins using deep learning","volume":"11","author":"Fang","year":"2022","journal-title":"Gigascience"},{"key":"2023063008143891500_btad229-B18","doi-asserted-by":"crossref","first-page":"615711","DOI":"10.3389\/fmicb.2021.615711","article-title":"VirionFinder: identification of complete and partial prokaryote virus virion protein from virome data using the sequence and biochemical properties of amino acids","volume":"12","author":"Fang","year":"2021","journal-title":"Front Microbiol"},{"key":"2023063008143891500_btad229-B19","doi-asserted-by":"crossref","first-page":"530696","DOI":"10.1155\/2013\/530696","article-title":"Naive Bayes classifier with feature selection to identify phage virion proteins","volume":"2013","author":"Feng","year":"2013","journal-title":"Comput Math Methods Med"},{"key":"2023063008143891500_btad229-B20","doi-asserted-by":"crossref","first-page":"1171","DOI":"10.1038\/s41396-018-0049-5","article-title":"Phage or foe: an insight into the impact of viral predation on microbial communities","volume":"12","author":"Fern\u00e1ndez","year":"2018","journal-title":"ISME J"},{"key":"2023063008143891500_btad229-B21","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1016\/0263-7855(94)80109-6","article-title":"Chaos game representation of protein structures","volume":"12","author":"Fiser","year":"1994","journal-title":"J Mol Graph"},{"key":"2023063008143891500_btad229-B22","author":"Ghiasi","year":"2022"},{"key":"2023063008143891500_btad229-B23","doi-asserted-by":"crossref","first-page":"1506","DOI":"10.3390\/sym13081506","article-title":"iPVP-MCV: a multi-classifier voting model for the accurate identification of phage virion proteins","volume":"13","author":"Han","year":"2021","journal-title":"Symmetry"},{"key":"2023063008143891500_btad229-B24","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1016\/j.ygeno.2016.08.002","article-title":"Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison","volume":"108","author":"Hoang","year":"2016","journal-title":"Genomics"},{"key":"2023063008143891500_btad229-B25","doi-asserted-by":"crossref","first-page":"e11396","DOI":"10.7717\/peerj.11396","article-title":"BACPHLIP: predicting bacteriophage lifestyle from conserved protein domains","volume":"9","author":"Hockenberry","year":"2021","journal-title":"PeerJ"},{"key":"2023063008143891500_btad229-B26","doi-asserted-by":"crossref","first-page":"1511","DOI":"10.1038\/ismej.2017.16","article-title":"Lysogeny in nature: mechanisms, impact and ecology of temperate phages","volume":"11","author":"Howard-Varona","year":"2017","journal-title":"ISME J"},{"key":"2023063008143891500_btad229-B27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-11-119","article-title":"Prodigal: prokaryotic gene recognition and translation initiation site identification","volume":"11","author":"Hyatt","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023063008143891500_btad229-B28","doi-asserted-by":"crossref","first-page":"2163","DOI":"10.1093\/nar\/18.8.2163","article-title":"Chaos game representation of gene structure","volume":"18","author":"Jeffrey","year":"1990","journal-title":"Nucleic Acids Res"},{"key":"2023063008143891500_btad229-B29","first-page":"11","article-title":"Large-scale comparative review and assessment of computational methods for phage virion proteins identification","volume":"21","author":"Kabir","year":"2022","journal-title":"Excli J"},{"key":"2023063008143891500_btad229-B30","doi-asserted-by":"crossref","first-page":"295","DOI":"10.24171\/j.phrp.2019.10.5.06","article-title":"Osong public health and research perspectives","volume":"10","author":"Lee","year":"2019","journal-title":"Osong Public Health Res Perspect"},{"key":"2023063008143891500_btad229-B31","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2023063008143891500_btad229-B32","doi-asserted-by":"crossref","first-page":"272","DOI":"10.1093\/bioinformatics\/btz493","article-title":"Deep learning on chaos game representation for proteins","volume":"36","author":"L\u00f6chel","year":"2020","journal-title":"Bioinformatics"},{"key":"2023063008143891500_btad229-B33","doi-asserted-by":"crossref","first-page":"6263","DOI":"10.1016\/j.csbj.2021.11.008","article-title":"Chaos game representation and its applications in bioinformatics","volume":"19","author":"L\u00f6chel","year":"2021","journal-title":"Comput Struct Biotechnol J"},{"key":"2023063008143891500_btad229-B34","doi-asserted-by":"crossref","first-page":"1746","DOI":"10.1001\/jama.2017.12938","article-title":"Phage therapy\u2019s role in combating antibiotic-resistant pathogens","volume":"318","author":"Lyon","year":"2017","journal-title":"JAMA"},{"key":"2023063008143891500_btad229-B35","doi-asserted-by":"crossref","first-page":"476","DOI":"10.3389\/fmicb.2018.00476","article-title":"PVP-SVM: sequence-based prediction of phage virion proteins using a support vector machine","volume":"9","author":"Manavalan","year":"2018","journal-title":"Front Microbiol"},{"key":"2023063008143891500_btad229-B36","doi-asserted-by":"crossref","first-page":"140406","DOI":"10.1016\/j.bbapap.2020.140406","article-title":"Review and comparative analysis of machine learning-based phage virion protein identification methods","volume":"1868","author":"Meng","year":"2020","journal-title":"Biochim Biophys Acta Proteins Proteom"},{"key":"2023063008143891500_btad229-B37","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/femsle\/fnu022","article-title":"Bacterial genome remodeling through bacteriophage recombination","volume":"362","author":"Menouni","year":"2015","journal-title":"FEMS Microbiol Lett"},{"key":"2023063008143891500_btad229-B38","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12866-021-02256-5","article-title":"Application of machine learning in bacteriophage research","volume":"21","author":"Nami","year":"2021","journal-title":"BMC Microbiol"},{"key":"2023063008143891500_btad229-B39","doi-asserted-by":"crossref","first-page":"1779","DOI":"10.3390\/ijms19061779","article-title":"Identification of bacteriophage virion proteins using multinomial naive Bayes with g-gap feature tree","volume":"19","author":"Pan","year":"2018","journal-title":"IJMS"},{"key":"2023063008143891500_btad229-B40","doi-asserted-by":"crossref","first-page":"e1009492","DOI":"10.1371\/journal.pcbi.1009492","article-title":"Constructing benchmark test sets for biological sequence analysis using independent set algorithms","volume":"18","author":"Petti","year":"2022","journal-title":"PLoS Comput Biol"},{"key":"2023063008143891500_btad229-B41","first-page":"12116","article-title":"Do vision transformers see like convolutional neural networks?","volume":"34","author":"Raghu","year":"2021","journal-title":"Adv Neural Inf Process Syst"},{"key":"2023063008143891500_btad229-B42","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1093\/bioinformatics\/btab681","article-title":"Prediction of antimicrobial resistance based on whole-genome sequencing and machine learning","volume":"38","author":"Ren","year":"2022","journal-title":"Bioinformatics"},{"key":"2023063008143891500_btad229-B43","doi-asserted-by":"crossref","first-page":"e985","DOI":"10.7717\/peerj.985","article-title":"VirSorter: mining viral signal from microbial genomic data","volume":"3","author":"Roux","year":"2015","journal-title":"PeerJ"},{"key":"2023063008143891500_btad229-B44","doi-asserted-by":"crossref","first-page":"507","DOI":"10.3389\/fmicb.2019.00507","article-title":"Identification of phage viral proteins with hybrid sequence features","volume":"10","author":"Ru","year":"2019","journal-title":"Front Microbiol"},{"key":"2023063008143891500_btad229-B45","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1002657","article-title":"Artificial neural networks trained to detect viral and phage structural proteins","volume-title":"PLoS Comput Biol","author":"Seguritan","year":"2012"},{"key":"2023063008143891500_btad229-B46","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12915-021-01180-4","article-title":"Predicting the hosts of prokaryotic viruses using GCN-based semi-supervised learning","volume":"19","author":"Shang","year":"2021","journal-title":"BMC Biol"},{"key":"2023063008143891500_btad229-B47","doi-asserted-by":"crossref","first-page":"bbac182","DOI":"10.1093\/bib\/bbac182","article-title":"CHERRY: a computational metHod for accuratE pRediction of virus\u2013pRokarYotic interactions using a graph encoder\u2013decoder model","volume":"23","author":"Shang","year":"2022","journal-title":"Brief Bioinform"},{"key":"2023063008143891500_btad229-B48","doi-asserted-by":"crossref","first-page":"197884","DOI":"10.1016\/j.virusres.2020.197884","article-title":"Characterization and genome analysis of B1 sub-cluster mycobacteriophage PDRPxv","volume":"279","author":"Sinha","year":"2020","journal-title":"Virus Res"},{"key":"2023063008143891500_btad229-B49","doi-asserted-by":"crossref","first-page":"10584","DOI":"10.1073\/pnas.93.20.10584","article-title":"Crystal structure of phage P22 tailspike protein complexed with Salmonella sp. O-antigen receptors","volume":"93","author":"Steinbacher","year":"1996","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023063008143891500_btad229-B50","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1038\/nsmb874","article-title":"Crystal structure of the polysialic acid\u2013degrading endosialidase of bacteriophage K1F","volume":"12","author":"Stummeyer","year":"2005","journal-title":"Nat Struct Mol Biol"},{"key":"2023063008143891500_btad229-B51","doi-asserted-by":"crossref","first-page":"2000","DOI":"10.3390\/molecules23082000","article-title":"Identifying phage virion proteins by using two-step feature selection methods","volume":"23","author":"Tan","year":"2018","journal-title":"Molecules"},{"key":"2023063008143891500_btad229-B52","first-page":"5998","author":"Vaswani","year":"2017"},{"key":"2023063008143891500_btad229-B53","doi-asserted-by":"crossref","first-page":"1","DOI":"10.2174\/1389450043490668","article-title":"Epitope identification and discovery using phage display libraries: applications in vaccine development and diagnostics","volume":"5","author":"Wang","year":"2004","journal-title":"Curr Drug Targets"},{"key":"2023063008143891500_btad229-B54","first-page":"1810","author":"Wang","year":"2019"},{"key":"2023063008143891500_btad229-B55","doi-asserted-by":"crossref","first-page":"21734","DOI":"10.3390\/ijms160921734","article-title":"An ensemble method to distinguish bacteriophage virion from non-virion proteins based on protein sequence characteristics","volume":"16","author":"Zhang","year":"2015","journal-title":"Int J Mol Sci"},{"key":"2023063008143891500_btad229-B56","doi-asserted-by":"crossref","first-page":"1032186","DOI":"10.3389\/fmicb.2022.1032186","article-title":"Phage family classification under Caudoviricetes: a review of current tools using the latest ICTV classification framework","volume":"13","author":"Zhu","year":"2022","journal-title":"Front Microbiol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/Supplement_1\/i30\/50741407\/btad229.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/Supplement_1\/i30\/50741407\/btad229.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T08:16:03Z","timestamp":1688112963000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/39\/Supplement_1\/i30\/7210437"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,1]]},"references-count":56,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2023,6,30]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad229","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,6,1]]},"published":{"date-parts":[[2023,6,1]]}}}