{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,8]],"date-time":"2026-05-08T07:55:28Z","timestamp":1778226928244,"version":"3.51.4"},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2020,2,28]],"date-time":"2020-02-28T00:00:00Z","timestamp":1582848000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000038","name":"NSERC","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]},{"name":"RGPIN"},{"name":"Genome Canada\/Genome BC and Simon Fraser University"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Many methods for microbial protein subcellular localization (SCL) prediction exist; however, none is readily available for analysis of metagenomic sequence data, despite growing interest from researchers studying microbial communities in humans, agri-food relevant organisms and in other environments (e.g. for identification of cell-surface biomarkers for rapid protein-based diagnostic tests). We wished to also identify new markers of water quality from freshwater samples collected from pristine versus pollution-impacted watersheds.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We report PSORTm, the first bioinformatics tool designed for prediction of diverse bacterial and archaeal protein SCL from metagenomics data. PSORTm incorporates components of PSORTb, one of the most precise and widely used protein SCL predictors, with an automated classification by cell envelope. An evaluation using 5-fold cross-validation with in silico-fragmented sequences with known localization showed that PSORTm maintains PSORTb\u2019s high precision, while sensitivity increases proportionately with metagenomic sequence fragment length. PSORTm\u2019s read-based analysis was similar to PSORTb-based analysis of metagenome-assembled genomes (MAGs); however, the latter requires non-trivial manual classification of each MAG by cell envelope, and cannot make use of unassembled sequences. Analysis of the watershed samples revealed the importance of normalization and identified potential biomarkers of water quality. This method should be useful for examining a wide range of microbial communities, including human microbiomes, and other microbiomes of medical, environmental or industrial importance.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Documentation, source code and docker containers are available for running PSORTm locally at https:\/\/www.psort.org\/psortm\/ (freely available, open-source software under GNU General Public License Version 3).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa136","type":"journal-article","created":{"date-parts":[[2020,2,25]],"date-time":"2020-02-25T20:16:28Z","timestamp":1582661788000},"page":"3043-3048","source":"Crossref","is-referenced-by-count":18,"title":["PSORTm: a bacterial and archaeal protein subcellular localization prediction tool for metagenomics data"],"prefix":"10.1093","volume":"36","author":[{"given":"Michael A","family":"Peabody","sequence":"first","affiliation":[{"name":"Department of Molecular Biology and Biochemistry"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wing Yin Venus","family":"Lau","sequence":"additional","affiliation":[{"name":"Department of Molecular Biology and Biochemistry"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gemma R","family":"Hoad","sequence":"additional","affiliation":[{"name":"Department of Molecular Biology and Biochemistry"},{"name":"Research Computing Group , Simon Fraser University, Burnaby, BC V5A 1S6, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Baofeng","family":"Jia","sequence":"additional","affiliation":[{"name":"Department of Molecular Biology and Biochemistry"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Finlay","family":"Maguire","sequence":"additional","affiliation":[{"name":"Department of Computer Science , Dalhousie University, Halifax, NS B3H 4R2, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kristen L","family":"Gray","sequence":"additional","affiliation":[{"name":"Department of Molecular Biology and Biochemistry"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Robert G","family":"Beiko","sequence":"additional","affiliation":[{"name":"Department of Computer Science , Dalhousie University, Halifax, NS B3H 4R2, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fiona S L","family":"Brinkman","sequence":"additional","affiliation":[{"name":"Department of Molecular Biology and Biochemistry"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2020,2,28]]},"reference":[{"key":"2023013112031049100_btaa136-B82808800","first-page":"D517","article-title":"CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acid Research","author":"Alcock","year":"2020"},{"key":"2023013112031049100_btaa136-B1","doi-asserted-by":"crossref","first-page":"2114","DOI":"10.1093\/bioinformatics\/btu170","article-title":"Trimmomatic: a flexible trimmer for Illumina sequence data","volume":"30","author":"Bolger","year":"2014","journal-title":"Bioinformatics"},{"key":"2023013112031049100_btaa136-B2","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nmeth.3176","article-title":"Fast and sensitive protein alignment using DIAMOND","volume":"12","author":"Buchfink","year":"2015","journal-title":"Nat. Methods"},{"key":"2023013112031049100_btaa136-B3","doi-asserted-by":"crossref","first-page":"3613","DOI":"10.1093\/nar\/gkg602","article-title":"PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria","volume":"31","author":"Gardy","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023013112031049100_btaa136-B4","doi-asserted-by":"crossref","first-page":"617","DOI":"10.1093\/bioinformatics\/bti057","article-title":"PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis","volume":"21","author":"Gardy","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013112031049100_btaa136-B5","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1093\/bioinformatics\/btr708","article-title":"ART: a next-generation sequencing read simulator","volume":"28","author":"Huang","year":"2012","journal-title":"Bioinformatics"},{"key":"2023013112031049100_btaa136-B6","doi-asserted-by":"crossref","first-page":"2223","DOI":"10.1093\/bioinformatics\/bts429","article-title":"Gene and translation initiation site prediction in metagenomic sequences","volume":"28","author":"Hyatt","year":"2012","journal-title":"Bioinformatics"},{"key":"2023013112031049100_btaa136-B22763973","article-title":"Sickle: a sliding-window, adaptive, quality-based trimming tool for FASTQ files (version 1.33). Available\u00a0at:","year":"2011)"},{"key":"2023013112031049100_btaa136-B8","doi-asserted-by":"crossref","first-page":"e9","DOI":"10.1093\/nar\/gkr1067","article-title":"Gene prediction with Glimmer on metagenomic sequences augmented by phylogenetic classification and clustering","volume":"40","author":"Kelley","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023013112031049100_btaa136-B9","volume-title":"Applied Bioinformatics for Public Health Microbiology Conference","author":"Lau","year":"2019"},{"key":"2023013112031049100_btaa136-B10","doi-asserted-by":"crossref","first-page":"21219","DOI":"10.1073\/pnas.0907586106","article-title":"Subcellular localization of marine bacterial alkaline phosphatases","volume":"106","author":"Luo","year":"2009","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023013112031049100_btaa136-B11","doi-asserted-by":"crossref","first-page":"11257","DOI":"10.1038\/ncomms11257","article-title":"Fast and sensitive taxonomic classification for metagenomics with Kaiju","volume":"7","author":"Menzel","year":"2016","journal-title":"Nat. Commun"},{"key":"2023013112031049100_btaa136-B14","doi-asserted-by":"crossref","first-page":"824","DOI":"10.1101\/gr.213959.116","article-title":"metaSPAdes: a new versatile metagenomic assembler","volume":"27","author":"Nurk","year":"2017","journal-title":"Genome Res"},{"key":"2023013112031049100_btaa136-B15","doi-asserted-by":"crossref","first-page":"D733","DOI":"10.1093\/nar\/gkv1189","article-title":"Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation","volume":"44","author":"O\u2019Leary","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023013112031049100_btaa136-B16","doi-asserted-by":"crossref","first-page":"1533","DOI":"10.1038\/s41564-017-0012-7","article-title":"Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life","volume":"2","author":"Parks","year":"2017","journal-title":"Nat. Microbiol"},{"key":"2023013112031049100_btaa136-B17","doi-asserted-by":"crossref","first-page":"D663","DOI":"10.1093\/nar\/gkv1271","article-title":"PSORTdb: expanding the bacteria and archaea protein subcellular localization database to better reflect diversity in cell envelope structures","volume":"44","author":"Peabody","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023013112031049100_btaa136-B18","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1038\/nmeth.1701","article-title":"SignalP 4.0: discriminating signal peptides from transmembrane regions","volume":"8","author":"Petersen","year":"2011","journal-title":"Nat. Methods"},{"key":"2023013112031049100_btaa136-B19","doi-asserted-by":"crossref","first-page":"D164","DOI":"10.1093\/nar\/gki027","article-title":"PSORTdb: a database of subcellular localizations for bacteria","volume":"33","author":"Rey","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023013112031049100_btaa136-B20","doi-asserted-by":"crossref","first-page":"e191","DOI":"10.1093\/nar\/gkq747","article-title":"FragGeneScan: predicting genes in short and error-prone reads","volume":"38","author":"Rho","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023013112031049100_btaa136-B24","doi-asserted-by":"crossref","first-page":"836","DOI":"10.1038\/s41564-018-0171-1","article-title":"Recovery of genomes from metagenomics via a dereplication, aggregation and scoring strategy","volume":"3","author":"Sieber","year":"2018","journal-title":"Nat. Microbiol"},{"key":"2023013112031049100_btaa136-B27","doi-asserted-by":"crossref","first-page":"W365","DOI":"10.1093\/nar\/gkh485","article-title":"Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations","volume":"32","author":"Szafron","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023013112031049100_btaa136-B29","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/1471-2105-16-S12-S1","article-title":"Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble","volume":"16 (Suppl. 12","author":"Wang","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023013112031049100_btaa136-B30","doi-asserted-by":"crossref","first-page":"D187","DOI":"10.1093\/nar\/gkj161","article-title":"The Universal Protein Resource (UniProt): an expanding universe of protein information","volume":"34","author":"Wu","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023013112031049100_btaa136-B31","doi-asserted-by":"crossref","first-page":"1608","DOI":"10.1093\/bioinformatics\/btq249","article-title":"PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes","volume":"26","author":"Yu","year":"2010","journal-title":"Bioinformatics"},{"key":"2023013112031049100_btaa136-B32","doi-asserted-by":"crossref","first-page":"D241","DOI":"10.1093\/nar\/gkq1093","article-title":"PSORTdb\u2013an expanded, auto-updated, user-friendly protein subcellular localization database for Bacteria and Archaea","volume":"39","author":"Yu","year":"2011","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa136\/32980722\/btaa136.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/10\/3043\/48990906\/bioinformatics_36_10_3043.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/10\/3043\/48990906\/bioinformatics_36_10_3043.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T21:14:04Z","timestamp":1675199644000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/10\/3043\/5766116"}},"subtitle":[],"editor":[{"given":"Yann","family":"Ponty","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2020,2,28]]},"references-count":25,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2020,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa136","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,5,15]]},"published":{"date-parts":[[2020,2,28]]}}}