{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T15:58:44Z","timestamp":1776787124483,"version":"3.51.2"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1007845","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2020,11,12]],"date-time":"2020-11-12T00:00:00Z","timestamp":1605139200000}}],"reference-count":35,"publisher":"Public Library of Science (PLoS)","issue":"11","license":[{"start":{"date-parts":[[2020,11,2]],"date-time":"2020-11-02T00:00:00Z","timestamp":1604275200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100011039","name":"Intelligence Advanced Research Projects Activity","doi-asserted-by":"publisher","award":["W911NF-17-2-0105"],"award-info":[{"award-number":["W911NF-17-2-0105"]}],"id":[{"id":"10.13039\/100011039","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000121","name":"Division of Mathematical Sciences","doi-asserted-by":"publisher","award":["0827278"],"award-info":[{"award-number":["0827278"]}],"id":[{"id":"10.13039\/100000121","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>\n                    For any given bacteriophage genome or phage-derived sequences in metagenomic data sets, we are unable to assign a function to 50\u201390% of genes, or more. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an \u201cother\u201d category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence diversity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F\n                    <jats:sub>1<\/jats:sub>\n                    -score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten structural classes or, if not predicted to fall in one of the ten classes, as \u201cother,\u201d providing a new approach for functional annotation of phage proteins. PhANNs is open source and can be run from our web server or installed locally.\n                  <\/jats:p>","DOI":"10.1371\/journal.pcbi.1007845","type":"journal-article","created":{"date-parts":[[2020,11,2]],"date-time":"2020-11-02T13:47:47Z","timestamp":1604324867000},"page":"e1007845","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":83,"title":["PhANNs, a fast and accurate tool and web server to classify phage structural proteins"],"prefix":"10.1371","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3168-5140","authenticated-orcid":true,"given":"Vito Adrian","family":"Cantu","sequence":"first","affiliation":[]},{"given":"Peter","family":"Salamon","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6893-2846","authenticated-orcid":true,"given":"Victor","family":"Seguritan","sequence":"additional","affiliation":[]},{"given":"Jackson","family":"Redfield","sequence":"additional","affiliation":[]},{"given":"David","family":"Salamon","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8383-8949","authenticated-orcid":true,"given":"Robert A.","family":"Edwards","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8454-5248","authenticated-orcid":true,"given":"Anca M.","family":"Segall","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2020,11,2]]},"reference":[{"issue":"1","key":"pcbi.1007845.ref001","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1146\/annurev-virology-100114-054952","article-title":"Viruses as Winners in the Game of Life.","volume":"3","author":"AG Cobi\u00e1n G\u00fcemes","year":"2016","journal-title":"Annu Rev Virol."},{"issue":"5270","key":"pcbi.1007845.ref002","doi-asserted-by":"crossref","first-page":"1910","DOI":"10.1126\/science.272.5270.1910","article-title":"Lysogenic conversion by a filamentous phage encoding cholera toxin","volume":"272","author":"MK Waldor","year":"1996","journal-title":"Science"},{"issue":"7","key":"pcbi.1007845.ref003","doi-asserted-by":"crossref","first-page":"754","DOI":"10.1038\/s41564-018-0166-y","article-title":"Phage puppet masters of the marine microbial realm.","volume":"3","author":"M Breitbart","year":"2018","journal-title":"Nat Microbiol."},{"issue":"6","key":"pcbi.1007845.ref004","doi-asserted-by":"crossref","first-page":"1150","DOI":"10.1038\/ismej.2013.4","article-title":"Structure and function of a cyanophage-encoded peptide deformylase.","volume":"7","author":"JA Frank","year":"2013","journal-title":"ISME J."},{"issue":"7595","key":"pcbi.1007845.ref005","doi-asserted-by":"crossref","first-page":"466","DOI":"10.1038\/nature17193","article-title":"Lytic to temperate switching of viral communities","volume":"531","author":"B Knowles","year":"2016","journal-title":"Nature"},{"key":"pcbi.1007845.ref006","first-page":"114819","article-title":"Prophage genomics reveals patterns in phage genome organization and replication","author":"HS Kang","year":"2017","journal-title":"bioRxiv"},{"issue":"6","key":"pcbi.1007845.ref007","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1038\/nrmicro1163","article-title":"Viral metagenomics.","volume":"3","author":"RA Edwards","year":"2005","journal-title":"Nat Rev Microbiol"},{"issue":"4","key":"pcbi.1007845.ref008","doi-asserted-by":"crossref","first-page":"343","DOI":"10.3390\/v11040343","article-title":"Current State of Compassionate Phage Therapy.","volume":"11","author":"S McCallin","year":"2019","journal-title":"Viruses."},{"issue":"1","key":"pcbi.1007845.ref009","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1146\/annurev-micro-090817-062535","article-title":"Phage Therapy in the Twenty-First Century: Facing the Decline of the Antibiotic Era; Is It Finally Time for the Age of the Phage?","volume":"73","author":"S Hesse","year":"2019","journal-title":"Annu Rev Microbiol"},{"issue":"8","key":"pcbi.1007845.ref010","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1002657","article-title":"Artificial Neural Networks Trained to Detect Viral and Phage Structural Proteins.","volume":"8","author":"V Seguritan","year":"2012","journal-title":"PLoS Comput Biol."},{"issue":"9","key":"pcbi.1007845.ref011","doi-asserted-by":"crossref","first-page":"1405","DOI":"10.1093\/bioinformatics\/btv727","article-title":"VIRALpro: A tool to identify viral capsid and tail sequences","volume":"32","author":"C Galiez","year":"2016","journal-title":"Bioinformatics"},{"key":"pcbi.1007845.ref012","volume":"45","author":"BC Cs\u00e1ji","year":"2001","journal-title":"Approximation with Artificial Neural Networks"},{"issue":"3","key":"pcbi.1007845.ref013","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1128\/MMBR.00014-11","article-title":"A common evolutionary origin for tailed bacteriophage functional modules and bacterial machineries","volume":"75","author":"D Veesler","year":"2011","journal-title":"Micr Mol Biol Rev"},{"issue":"2","key":"pcbi.1007845.ref014","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1109\/MIS.2009.36","article-title":"The Unreasonable Effectiveness of Data","volume":"24","author":"A Halevy","year":"2009","journal-title":"IEEE Intell Syst"},{"issue":"D1","key":"pcbi.1007845.ref015","doi-asserted-by":"crossref","first-page":"D535","DOI":"10.1093\/nar\/gkw1017","article-title":"Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center","volume":"45","author":"AR Wattam","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1007845.ref016","article-title":"PHANOTATE: a novel approach to gene identification in phage genomes.","author":"K McNair","year":"2019","journal-title":"Bioinforma Oxf Engl"},{"issue":"13","key":"pcbi.1007845.ref017","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.","volume":"22","author":"W Li","year":"2006","journal-title":"Bioinforma Oxf Engl"},{"issue":"2","key":"pcbi.1007845.ref018","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1093\/protein\/4.2.155","article-title":"Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence.","volume":"4","author":"K Guruprasad","year":"1990","journal-title":"Protein Eng Des Sel"},{"issue":"15","key":"pcbi.1007845.ref019","doi-asserted-by":"crossref","first-page":"3174","DOI":"10.1093\/nar\/22.15.3174","article-title":"Hydrophobicity, expressivity and aromaticity are the major trends of amino-acid usage in 999 Escherichia coli chromosome-encoded genes","volume":"22","author":"JR Lobry","year":"1994","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"pcbi.1007845.ref020","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1016\/0022-2836(82)90515-0","article-title":"A simple method for displaying the hydropathic character of a protein","volume":"157","author":"J Kyte","year":"1982","journal-title":"J Mol Biol"},{"issue":"11","key":"pcbi.1007845.ref021","doi-asserted-by":"crossref","first-page":"1422","DOI":"10.1093\/bioinformatics\/btp163","article-title":"Biopython: freely available Python tools for computational molecular biology and bioinformatics.","volume":"25","author":"PJA Cock","year":"2009","journal-title":"Bioinforma Oxf Engl"},{"key":"pcbi.1007845.ref022","unstructured":"Chollet F, others. Keras [Internet]. 2015. Available from: https:\/\/keras.io"},{"key":"pcbi.1007845.ref023","article-title":"TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems","author":"Mart\u00edn Abadi","year":"2015"},{"issue":"4","key":"pcbi.1007845.ref024","doi-asserted-by":"crossref","first-page":"655","DOI":"10.1016\/0022-2836(91)90662-P","article-title":"Single mutations in a gene for a tail fiber component of an Escherichia coli phage can cause an extension from a protein to a carbohydrate as a receptor","volume":"219","author":"K Drexler","year":"1991","journal-title":"J Mol Biol"},{"issue":"4","key":"pcbi.1007845.ref025","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1016\/S0923-2508(03)00069-X","article-title":"The diversity and evolution of the T4-type bacteriophages","volume":"154","author":"C Desplats","year":"2003","journal-title":"Res Microbiol"},{"issue":"4","key":"pcbi.1007845.ref026","doi-asserted-by":"crossref","first-page":"388","DOI":"10.1016\/j.mib.2007.06.004","article-title":"Diversity-generating retroelements.","volume":"10","author":"B Medhekar","year":"2007","journal-title":"Curr Opin Microbiol"},{"issue":"1","key":"pcbi.1007845.ref027","doi-asserted-by":"crossref","DOI":"10.1093\/femsle\/fnw235","article-title":"R-type bacteriocins in related strains of Xenorhabdus bovienii: Xenorhabdicin tail fiber modularity and contribution to competitiveness","volume":"364","author":"K Ciezki","year":"2017","journal-title":"FEMS Microbiol Lett"},{"issue":"2","key":"pcbi.1007845.ref028","article-title":"Parallel Evolution of Host-Attachment Proteins in Phage PP01 Populations Adapting to Escherichia coli O157:H7.","volume":"11","author":"C Akusobi","year":"2018","journal-title":"Pharm Basel Switz."},{"issue":"1","key":"pcbi.1007845.ref029","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1186\/s40168-018-0573-6","article-title":"A diversity-generating retroelement encoded by a globally ubiquitous Bacteroides phage.","volume":"6","author":"S Benler","year":"2018","journal-title":"Microbiome"},{"issue":"1","key":"pcbi.1007845.ref030","doi-asserted-by":"crossref","DOI":"10.1128\/mBio.01051-13","article-title":"A Broadly Implementable Research Course in Phage Discovery and Genomics for First-Year Undergraduate Students.","volume":"5","author":"TC Jordan","year":"2014","journal-title":"mBio"},{"key":"pcbi.1007845.ref031","doi-asserted-by":"crossref","unstructured":"Kanda N, Takeda R, Obuchi Y. Elastic spectral distortion for low resource speech recognition with deep neural networks. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. 2013. p. 309\u201314.","DOI":"10.1109\/ASRU.2013.6707748"},{"key":"pcbi.1007845.ref032","doi-asserted-by":"crossref","unstructured":"Ciregan D, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012. p. 3642\u20139.","DOI":"10.1109\/CVPR.2012.6248110"},{"key":"pcbi.1007845.ref033","first-page":"2013","article-title":"Na\u00efve bayes classifier with feature selection to identify phage virion proteins","author":"P-M Feng","year":"2013","journal-title":"Comput Math Methods Med"},{"issue":"9","key":"pcbi.1007845.ref034","doi-asserted-by":"crossref","first-page":"21734","DOI":"10.3390\/ijms160921734","article-title":"An ensemble method to distinguish bacteriophage virion from non-virion proteins based on protein sequence characteristics","volume":"16","author":"L Zhang","year":"2015","journal-title":"Int J Mol Sci"},{"key":"pcbi.1007845.ref035","doi-asserted-by":"crossref","DOI":"10.3389\/fmicb.2018.00476","article-title":"PVP-SVM: Sequence-based prediction of phage virion proteins using a support vector machine.","volume":"9","author":"B Manavalan","year":"2018","journal-title":"Front Microbiol"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1007845","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2020,11,12]],"date-time":"2020-11-12T00:00:00Z","timestamp":1605139200000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1007845","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,12]],"date-time":"2020-11-12T14:05:50Z","timestamp":1605189950000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1007845"}},"subtitle":[],"editor":[{"given":"Mihaela","family":"Pertea","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,11,2]]},"references-count":35,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2020,11,2]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1007845","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.04.03.023523","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,2]]}}}