{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T04:36:01Z","timestamp":1775277361208,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1011831","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2024,8,15]],"date-time":"2024-08-15T00:00:00Z","timestamp":1723680000000}}],"reference-count":45,"publisher":"Public Library of Science (PLoS)","issue":"8","license":[{"start":{"date-parts":[[2024,8,5]],"date-time":"2024-08-05T00:00:00Z","timestamp":1722816000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100002913","name":"Vlaamse Overheid","doi-asserted-by":"publisher","award":["Onderzoeksprogramma Artici\u00eble Intelligentie (AI) Vlaanderen"],"award-info":[{"award-number":["Onderzoeksprogramma Artici\u00eble Intelligentie (AI) Vlaanderen"]}],"id":[{"id":"10.13039\/501100002913","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002913","name":"Vlaamse Overheid","doi-asserted-by":"publisher","award":["Onderzoeksprogramma Artici\u00eble Intelligentie (AI) Vlaanderen"],"award-info":[{"award-number":["Onderzoeksprogramma Artici\u00eble Intelligentie (AI) Vlaanderen"]}],"id":[{"id":"10.13039\/501100002913","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100016386","name":"Conselleria de Innovaci\u00f3n, Universidades, Ciencia y Sociedad Digital, Generalitat Valenciana","doi-asserted-by":"publisher","award":["GRISOLIAP\/2020\/158"],"award-info":[{"award-number":["GRISOLIAP\/2020\/158"]}],"id":[{"id":"10.13039\/501100016386","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100011033","name":"Agencia Estatal de Investigaci\u00f3n","doi-asserted-by":"publisher","award":["RYC2019-028015-I"],"award-info":[{"award-number":["RYC2019-028015-I"]}],"id":[{"id":"10.13039\/501100011033","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001704","name":"European Society of Clinical Microbiology and Infectious Diseases","doi-asserted-by":"publisher","award":["20200063"],"award-info":[{"award-number":["20200063"]}],"id":[{"id":"10.13039\/501100001704","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100011033","name":"Agencia Estatal de Investigaci\u00f3n","doi-asserted-by":"publisher","award":["PID2020-112835RA-I00"],"award-info":[{"award-number":["PID2020-112835RA-I00"]}],"id":[{"id":"10.13039\/501100011033","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100016386","name":"Conselleria de Innovaci\u00f3n, Universidades, Ciencia y Sociedad Digital, Generalitat Valenciana","doi-asserted-by":"publisher","award":["SEJIGENT\/2021\/014"],"award-info":[{"award-number":["SEJIGENT\/2021\/014"]}],"id":[{"id":"10.13039\/501100016386","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003130","name":"Fonds Wetenschappelijk Onderzoek","doi-asserted-by":"publisher","award":["1S69520N"],"award-info":[{"award-number":["1S69520N"]}],"id":[{"id":"10.13039\/501100003130","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Bacteriophages (phages) are viruses that infect bacteria. Many of them produce specific enzymes called depolymerases to break down external polysaccharide structures. Accurate annotation and domain identification of these depolymerases are challenging due to their inherent sequence diversity. Hence, we present DepoScope, a machine learning tool that combines a fine-tuned ESM-2 model with a convolutional neural network to identify depolymerase sequences and their enzymatic domains precisely. To accomplish this, we curated a dataset from the INPHARED phage genome database, created a polysaccharide-degrading domain database, and applied sequential filters to construct a high-quality dataset, which is subsequently used to train DepoScope. Our work is the first approach that combines sequence-level predictions with amino-acid-level predictions for accurate depolymerase detection and functional domain identification. In that way, we believe that DepoScope can greatly enhance our understanding of phage-host interactions at the level of depolymerases.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1011831","type":"journal-article","created":{"date-parts":[[2024,8,5]],"date-time":"2024-08-05T13:53:15Z","timestamp":1722865995000},"page":"e1011831","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":26,"title":["DepoScope: Accurate phage depolymerase annotation and domain delineation using large language models"],"prefix":"10.1371","volume":"20","author":[{"given":"Robby","family":"Concha-Eloko","sequence":"first","affiliation":[]},{"given":"Michiel","family":"Stock","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3876-620X","authenticated-orcid":true,"given":"Bernard","family":"De Baets","sequence":"additional","affiliation":[]},{"given":"Yves","family":"Briers","sequence":"additional","affiliation":[]},{"given":"Rafael","family":"Sanju\u00e1n","sequence":"additional","affiliation":[]},{"given":"Pilar","family":"Domingo-Calap","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7668-2840","authenticated-orcid":true,"given":"Dimitri","family":"Boeckaerts","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2024,8,5]]},"reference":[{"issue":"6","key":"pcbi.1011831.ref001","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1016\/j.str.2020.04.015","article-title":"Functional Studies of a Klebsiella Phage Capsule Depolymerase Tailspike: Mechanistic Insights into Capsular Degradation","volume":"28","author":"F Squeglia","year":"2020","journal-title":"Structure"},{"issue":"3","key":"pcbi.1011831.ref002","doi-asserted-by":"crossref","first-page":"e00455","DOI":"10.1128\/mBio.00455-21","article-title":"Engineering the Modular Receptor-Binding Proteins of Klebsiella Phages Switches Their Capsule Serotype Specificity","volume":"12","author":"A Latka","year":"2021","journal-title":"mBio"},{"issue":"9","key":"pcbi.1011831.ref003","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1038\/nrmicro2415","article-title":"The Biofilm Matrix","volume":"8","author":"HC Flemming","year":"2010","journal-title":"Nature Reviews Microbiology"},{"issue":"18","key":"pcbi.1011831.ref004","doi-asserted-by":"crossref","first-page":"e00920","DOI":"10.1128\/JVI.00920-21","article-title":"Novel Host Recognition Mechanism of the K1 Capsule-Specific Phage of Escherichia Coli: Capsular Polysaccharide as the First Receptor and Lipopolysaccharide as the Secondary Receptor","volume":"95","author":"Q Gong","year":"2021","journal-title":"Journal of Virology"},{"issue":"1","key":"pcbi.1011831.ref005","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1038\/nrmicro2695","article-title":"Should We Stay or Should We Go: Mechanisms and Ecological Consequences for Biofilm Dispersal","volume":"10","author":"D McDougald","year":"2012","journal-title":"Nature Reviews Microbiology"},{"key":"pcbi.1011831.ref006","doi-asserted-by":"crossref","first-page":"2517","DOI":"10.3389\/fmicb.2018.02517","article-title":"Phage-Borne Depolymerases Decrease Klebsiella Pneumoniae Resistance to Innate Defense Mechanisms","volume":"9","author":"G Majkowska-Skrobek","year":"2018","journal-title":"Frontiers in Microbiology"},{"issue":"8","key":"pcbi.1011831.ref007","doi-asserted-by":"crossref","first-page":"707","DOI":"10.1016\/j.tim.2022.05.002","article-title":"Exploiting Phage-Derived Carbohydrate Depolymerases for Combating Infectious Diseases","volume":"30","author":"H Oliveira","year":"2022","journal-title":"Trends in Microbiology"},{"issue":"7","key":"pcbi.1011831.ref008","doi-asserted-by":"crossref","first-page":"e3001276","DOI":"10.1371\/journal.pbio.3001276","article-title":"Interplay between the Cell Envelope and Mobile Genetic Elements Shapes Gene Flow in Populations of the Nosocomial Pathogen Klebsiella Pneumoniae","volume":"19","author":"M Haudiquet","year":"2021","journal-title":"PLOS Biology"},{"issue":"1","key":"pcbi.1011831.ref009","doi-asserted-by":"crossref","first-page":"e01023","DOI":"10.1128\/Spectrum.01023-21","article-title":"Mechanistic Insights into the Capsule-Targeting Depolymerase from a Klebsiella Pneumoniae Bacteriophage","volume":"9","author":"RA Dunstan","year":"2021","journal-title":"Microbiology Spectrum"},{"key":"pcbi.1011831.ref010","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1016\/S0065-2164(10)70007-1","article-title":"Bacteriophage Host Range and Bacterial Resistance","volume":"70","author":"P Hyman","year":"2010","journal-title":"Advances in Applied Microbiology"},{"key":"pcbi.1011831.ref011","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/B978-0-12-800259-9.00004-4","article-title":"Bacteria\u2014Phage Interactions in Natural Environments","volume":"89","author":"SL D\u00edaz-Mu\u00f1oz","year":"2014","journal-title":"Advances in Applied Microbiology"},{"key":"pcbi.1011831.ref012","doi-asserted-by":"crossref","first-page":"2949","DOI":"10.3389\/fmicb.2019.02949","article-title":"Diversity and Function of Phage Encoded Depolymerases","volume":"10","author":"LE Knecht","year":"2020","journal-title":"Frontiers in Microbiology"},{"issue":"11","key":"pcbi.1011831.ref013","doi-asserted-by":"crossref","first-page":"e1007845","DOI":"10.1371\/journal.pcbi.1007845","article-title":"PhANNs, a fast and accurate tool and web server to classify phage structural proteins","volume":"6","author":"VA Cantu","year":"2020","journal-title":"PLOS Computational Biology"},{"key":"pcbi.1011831.ref014","doi-asserted-by":"crossref","first-page":"537","DOI":"10.1038\/s41564-023-01584-8","article-title":"Large language models improve annotation of prokaryotic viral proteins","volume":"9","author":"ZN Flamholz","year":"2024","journal-title":"Nat Microbiol"},{"key":"pcbi.1011831.ref015","doi-asserted-by":"crossref","first-page":"2649","DOI":"10.3389\/fmicb.2019.02649","article-title":"Modeling the Architecture of Depolymerase-Containing Receptor Binding Proteins in Klebsiella Phages","volume":"10","author":"A Latka","year":"2019","journal-title":"Frontiers in Microbiology"},{"issue":"1","key":"pcbi.1011831.ref016","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1186\/s12859-023-05341-w","article-title":"DePolymerase Predictor (DePP): A Machine Learning Tool for the Targeted Identification of Phage Depolymerases","volume":"24","author":"DJ Magill","year":"2023","journal-title":"BMC Bioinformatics"},{"key":"pcbi.1011831.ref017","article-title":"PhageDPO: Phage Depolymerase Finder","author":"M Vieira","year":"2023","journal-title":"BioRxiv"},{"issue":"1","key":"pcbi.1011831.ref018","doi-asserted-by":"crossref","first-page":"1914","DOI":"10.1038\/s41467-022-29443-w","article-title":"Learning Meaningful Representations of Protein Sequences","volume":"13","author":"NS Detlefsen","year":"2022","journal-title":"Nature Communications"},{"issue":"15","key":"pcbi.1011831.ref019","doi-asserted-by":"crossref","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences","volume":"118","author":"A Rives","year":"2021","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"pcbi.1011831.ref020","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model","volume":"379","author":"Z Lin","year":"2023","journal-title":"Science"},{"key":"pcbi.1011831.ref021","doi-asserted-by":"crossref","unstructured":"Thurimella K, Mohamed AMT, Graham DB, Owens RM, La Rosa SL, Plichta DR, et al. Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics. BioRxiv [Preprint]. 2023.","DOI":"10.1101\/2023.10.23.563620"},{"issue":"D1","key":"pcbi.1011831.ref022","doi-asserted-by":"crossref","first-page":"D571","DOI":"10.1093\/nar\/gkab1045","article-title":"The Carbohydrate-Active Enzyme Database: Functions and Literature","volume":"50","author":"E Drula","year":"2022","journal-title":"Nucleic Acids Research"},{"issue":"4","key":"pcbi.1011831.ref023","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1089\/phage.2021.0007","article-title":"INfrastructure for a PHAge REference Database: Identification of Large-Scale Biases in the Current Collection of Cultured Phage Genomes","volume":"2","author":"R Cook","year":"2021","journal-title":"PHAGE"},{"issue":"3","key":"pcbi.1011831.ref024","doi-asserted-by":"crossref","first-page":"lqab067","DOI":"10.1093\/nargab\/lqab067","article-title":"PHROG: Families of Prokaryotic Virus Proteins Clustered Using Remote Homology","volume":"3","author":"P Terzian","year":"2021","journal-title":"NAR Genomics and Bioinformatics"},{"issue":"5","key":"pcbi.1011831.ref025","doi-asserted-by":"crossref","first-page":"2141","DOI":"10.1007\/s00253-015-7247-0","article-title":"Bacteriophage-Encoded Depolymerases: Their Diversity and Biotechnological Applications","volume":"100","author":"DP Pires","year":"2016","journal-title":"Applied Microbiology and Biotechnology"},{"issue":"D1","key":"pcbi.1011831.ref026","doi-asserted-by":"crossref","first-page":"D418","DOI":"10.1093\/nar\/gkac993","article-title":"InterPro in 2022","volume":"51","author":"T Paysan-Lafosse","year":"2023","journal-title":"Nucleic Acids Research"},{"issue":"11","key":"pcbi.1011831.ref027","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1038\/nbt.3988","article-title":"MMseqs2 Enables Sensitive Protein Sequence Searching for the Analysis of Massive Data Sets","volume":"35","author":"M Steinegger","year":"2017","journal-title":"Nature Biotechnology"},{"issue":"1","key":"pcbi.1011831.ref028","doi-asserted-by":"crossref","first-page":"33964","DOI":"10.1038\/srep33964","article-title":"FAMSA: Fast and Accurate Multiple Sequence Alignment of Huge Protein Families","volume":"6","author":"S Deorowicz","year":"2016","journal-title":"Scientific Reports"},{"issue":"1","key":"pcbi.1011831.ref029","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1186\/s12859-019-3019-7","article-title":"HH-Suite3 for Fast Remote Homology Detection and Deep Protein Annotation","volume":"20","author":"M Steinegger","year":"2019","journal-title":"BMC Bioinformatics"},{"issue":"10","key":"pcbi.1011831.ref030","doi-asserted-by":"crossref","first-page":"1282","DOI":"10.1093\/bioinformatics\/btm098","article-title":"UniRef: Comprehensive and Non-Redundant UniProt Reference Clusters","volume":"23","author":"BE Suzek","year":"2007","journal-title":"Bioinformatics"},{"issue":"1","key":"pcbi.1011831.ref031","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/msb.2011.75","article-title":"Fast, Scalable Generation of High-quality Protein Multiple Sequence Alignments Using Clustal Omega","volume":"7","author":"F Sievers","year":"2011","journal-title":"Molecular Systems Biology"},{"issue":"13","key":"pcbi.1011831.ref032","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-Hit: A Fast Program for Clustering and Comparing Large Sets of Protein or Nucleotide Sequences","volume":"22","author":"W Li","year":"2006","journal-title":"Bioinformatics"},{"issue":"5814","key":"pcbi.1011831.ref033","doi-asserted-by":"crossref","first-page":"972","DOI":"10.1126\/science.1136800","article-title":"Clustering by Passing Messages Between Data Points","volume":"315","author":"BJ Frey","year":"2007","journal-title":"Science"},{"key":"pcbi.1011831.ref034","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1038\/s41587-023-01773-0","article-title":"Fast and Accurate Protein Structure Search with Foldseek","volume":"42","author":"M Van Kempen","year":"2023","journal-title":"Nature Biotechnology"},{"issue":"1","key":"pcbi.1011831.ref035","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1016\/j.jmb.2010.01.028","article-title":"Structural Basis for the Recognition and Cleavage of Polysialic Acid by the Bacteriophage K1F Tailspike Protein EndoNF","volume":"397","author":"EC Schulz","year":"2010","journal-title":"Journal of Molecular Biology"},{"issue":"12","key":"pcbi.1011831.ref036","doi-asserted-by":"crossref","first-page":"6424","DOI":"10.3390\/v7122946","article-title":"Structure of the Receptor-Binding Carboxy-Terminal Domain of the Bacteriophage T5 L-Shaped Tail Fibre with and without Its Intra-Molecular Chaperone","volume":"7","author":"C Garcia-Doval","year":"2015","journal-title":"Viruses"},{"issue":"49","key":"pcbi.1011831.ref037","doi-asserted-by":"crossref","first-page":"17652","DOI":"10.1073\/pnas.0504782102","article-title":"Structure of a Group A Streptococcal Phage-Encoded Virulence Factor Reveals a Catalytically Active Triple-Stranded \u03b2-Helix","volume":"102","author":"NL Smith","year":"2005","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"pcbi.1011831.ref038","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1016\/j.jbiotec.2020.01.017","article-title":"Application of a Protein Domain as Chaperone for Enhancing Biological Activity and Stability of Other Proteins","volume":"310","author":"R Jena","year":"2020","journal-title":"Journal of Biotechnology"},{"issue":"W1","key":"pcbi.1011831.ref039","doi-asserted-by":"crossref","first-page":"W732","DOI":"10.1093\/nar\/gkac370","article-title":"SWORD2: Hierarchical Analysis of Protein 3D Structures","volume":"50","author":"G Cretin","year":"2022","journal-title":"Nucleic Acids Research"},{"key":"pcbi.1011831.ref040","doi-asserted-by":"crossref","unstructured":"Akiba T, Sano S, Yanese T, Ohta T, Koyama M. Optuna: A Next-Generation Hyperparameter Optimization Framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019; 2623\u201331.","DOI":"10.1145\/3292500.3330701"},{"issue":"10","key":"pcbi.1011831.ref041","doi-asserted-by":"crossref","first-page":"7112","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning","volume":"44","author":"A Elnaggar","year":"2020","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"6","key":"pcbi.1011831.ref042","doi-asserted-by":"crossref","first-page":"932","DOI":"10.1038\/s41587-021-01179-w","article-title":"Using Deep Learning to Annotate the Protein Universe","volume":"40","author":"ML Bileschi","year":"2022","journal-title":"Nature Biotechnology"},{"issue":"8","key":"pcbi.1011831.ref043","doi-asserted-by":"crossref","first-page":"4676","DOI":"10.3390\/v7082839","article-title":"Structure and Biophysical Properties of a Triple-Stranded Beta-Helix Comprising the Central Spike of Bacteriophage T4","volume":"7","author":"S Buth","year":"2015","journal-title":"Viruses"},{"issue":"7462","key":"pcbi.1011831.ref044","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1038\/nature12453","article-title":"PAAR-Repeat Proteins Sharpen and Diversify the Type VI Secretion System Spike","volume":"500","author":"MM Shneider","year":"2013","journal-title":"Nature"},{"issue":"4","key":"pcbi.1011831.ref045","doi-asserted-by":"crossref","first-page":"101014","DOI":"10.1016\/j.jbc.2021.101014","article-title":"Structural Insights into the Mechanism of pH-Selective Substrate Specificity of the Polysaccharide Lyase Smlt1473","volume":"297","author":"S Pandey","year":"2021","journal-title":"Journal of Biological Chemistry"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1011831","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2024,8,15]],"date-time":"2024-08-15T00:00:00Z","timestamp":1723680000000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1011831","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,15]],"date-time":"2024-08-15T13:54:35Z","timestamp":1723730075000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1011831"}},"subtitle":[],"editor":[{"given":"Yang","family":"Lu","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,8,5]]},"references-count":45,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2024,8,5]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1011831","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.01.15.575807","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,5]]}}}