{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,12]],"date-time":"2026-04-12T14:41:31Z","timestamp":1776004891370,"version":"3.50.1"},"reference-count":30,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T00:00:00Z","timestamp":1686268800000},"content-version":"vor","delay-in-days":8,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Deep learning has moved to the forefront of tandem mass spectrometry-driven proteomics and authentic prediction for peptide fragmentation is more feasible than ever. Still, at this point spectral prediction is mainly used to validate database search results or for confined search spaces. Fully predicted spectral libraries have not yet been efficiently adapted to large search space problems that often occur in metaproteomics or proteogenomics.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>In this study, we showcase a workflow that uses Prosit for spectral library predictions on two common metaproteomes and implement an indexing and search algorithm, Mistle, to efficiently identify experimental mass spectra within the library. Hence, the workflow emulates a classic protein sequence database search with protein digestion but builds a searchable index from spectral predictions as an in-between step. We compare Mistle to popular search engines, both on a spectral and database search level, and provide evidence that this approach is more accurate than a database search using MSFragger. Mistle outperforms other spectral library search engines in terms of run time and proves to be extremely memory efficient with a 4- to 22-fold decrease in RAM usage. This makes Mistle universally applicable to large search spaces, e.g. covering comprehensive sequence databases of diverse microbiomes.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Mistle is freely available on GitHub at https:\/\/github.com\/BAMeScience\/Mistle.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad376","type":"journal-article","created":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T14:05:07Z","timestamp":1686319507000},"source":"Crossref","is-referenced-by-count":10,"title":["Mistle: bringing spectral library predictions to metaproteomics with an efficient search index"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7537-3694","authenticated-orcid":false,"given":"Yannek","family":"Nowatzky","sequence":"first","affiliation":[{"name":"Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM) , Berlin 12205, Germany"}]},{"given":"Philipp","family":"Benner","sequence":"additional","affiliation":[{"name":"Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM) , Berlin 12205, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3078-8129","authenticated-orcid":false,"given":"Knut","family":"Reinert","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Computer Science, FU Berlin , Berlin 14195, Germany"},{"name":"Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics , Berlin 14195, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8304-2684","authenticated-orcid":false,"given":"Thilo","family":"Muth","sequence":"additional","affiliation":[{"name":"Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM) , Berlin 12205, Germany"}]}],"member":"286","published-online":{"date-parts":[[2023,6,9]]},"reference":[{"key":"2023070312004298200_btad376-B1","doi-asserted-by":"crossref","first-page":"1363","DOI":"10.1038\/s41592-021-01301-5","article-title":"Deeplc can predict retention times for peptides that carry as-yet unseen modifications","volume":"18","author":"Bouwmeester","year":"2021","journal-title":"Nat Methods"},{"key":"2023070312004298200_btad376-B2","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1016\/B978-0-12-409548-9.11222-9","article-title":"Microbial communities","volume":"1","author":"Callieri","year":"2019","journal-title":"Encyclopedia of Ecology"},{"key":"2023070312004298200_btad376-B3","doi-asserted-by":"crossref","first-page":"2305","DOI":"10.1021\/pr301039b","article-title":"Spectrum-based method to generate good decoy libraries for spectral library searching in peptide identifications","volume":"12","author":"Cheng","year":"2013","journal-title":"J Proteome Res"},{"key":"2023070312004298200_btad376-B4","doi-asserted-by":"crossref","first-page":"519","DOI":"10.2144\/05384TE01","article-title":"Tandem mass spectrometry for peptide and protein sequence analysis","volume":"38","author":"Coon","year":"2005","journal-title":"Biotechniques"},{"key":"2023070312004298200_btad376-B5","doi-asserted-by":"crossref","first-page":"1794","DOI":"10.1021\/pr101065j","article-title":"Andromeda: a peptide search engine integrated into the maxquant environment","volume":"10","author":"Cox","year":"2011","journal-title":"J Proteome Res"},{"key":"2023070312004298200_btad376-B6","doi-asserted-by":"crossref","first-page":"1466","DOI":"10.1093\/bioinformatics\/bth092","article-title":"Tandem: matching proteins with tandem mass spectra","volume":"20","author":"Craig","year":"2004","journal-title":"Bioinformatics"},{"key":"2023070312004298200_btad376-B7","doi-asserted-by":"crossref","first-page":"i766","DOI":"10.1093\/bioinformatics\/bty567","article-title":"DREAM-Yara: an exact read mapper for very large databases with short update time","volume":"34","author":"Dadi","year":"2018","journal-title":"Bioinformatics"},{"key":"2023070312004298200_btad376-B8","doi-asserted-by":"crossref","first-page":"745","DOI":"10.1002\/prca.201400164","article-title":"Trans-proteomic pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics","volume":"9","author":"Deutsch","year":"2015","journal-title":"Proteomics Clin Appl"},{"key":"2023070312004298200_btad376-B9","doi-asserted-by":"crossref","first-page":"509","DOI":"10.1038\/s41592-019-0426-7","article-title":"Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning","volume":"16","author":"Gessulat","year":"2019","journal-title":"Nat Methods"},{"key":"2023070312004298200_btad376-B10","doi-asserted-by":"crossref","first-page":"4203","DOI":"10.1021\/ac303053e","article-title":"Metaproteomics: harnessing the power of high performance mass spectrometry to identify the suite of proteins that control metabolic activities in microbial communities","volume":"85","author":"Hettich","year":"2013","journal-title":"Anal Chem"},{"key":"2023070312004298200_btad376-B11","doi-asserted-by":"crossref","first-page":"923","DOI":"10.1038\/nmeth1113","article-title":"Semi-supervised learning for peptide identification from shotgun proteomics datasets","volume":"4","author":"K\u00e4ll","year":"2007","journal-title":"Nat Methods"},{"key":"2023070312004298200_btad376-B12","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1038\/nmeth.4256","article-title":"MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics","volume":"14","author":"Kong","year":"2017","journal-title":"Nat Methods"},{"key":"2023070312004298200_btad376-B13","doi-asserted-by":"crossref","first-page":"1116","DOI":"10.1080\/19490976.2019.1702431","article-title":"Following the community development of sihumix\u2014a new intestinal in vitro model for bioreactor use","volume":"11","author":"Krause","year":"2020","journal-title":"Gut Microbes"},{"key":"2023070312004298200_btad376-B14","doi-asserted-by":"crossref","first-page":"655","DOI":"10.1002\/pmic.200600625","article-title":"Development and validation of a spectral library searching method for peptide identification from MS\/MS","volume":"7","author":"Lam","year":"2007","journal-title":"Proteomics"},{"key":"2023070312004298200_btad376-B15","doi-asserted-by":"crossref","first-page":"3439","DOI":"10.1002\/pmic.201400560","article-title":"Navigating through metaproteomics data: a logbook of database searching","volume":"15","author":"Muth","year":"2015","journal-title":"Proteomics"},{"key":"2023070312004298200_btad376-B16","doi-asserted-by":"crossref","first-page":"2092","DOI":"10.1016\/j.jprot.2010.08.009","article-title":"A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics","volume":"73","author":"Nesvizhskii","year":"2010","journal-title":"J Proteomics"},{"key":"2023070312004298200_btad376-B17","doi-asserted-by":"crossref","first-page":"5527","DOI":"10.1007\/s12035-015-9456-z","article-title":"Cellular signature of sil1 depletion: disease pathogenesis due to alterations in protein composition beyond the ER machinery","volume":"53","author":"Roos","year":"2016","journal-title":"Mol Neurobiol"},{"key":"2023070312004298200_btad376-B18","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1080\/14789450.2019.1609944","article-title":"Challenges and promise at the interface of metaproteomics and genomics: an overview of recent progress in metaproteogenomic data analysis","volume":"16","author":"Schiebenhoefer","year":"2019","journal-title":"Expert Rev Proteomics"},{"key":"2023070312004298200_btad376-B19","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1016\/B978-0-12-410472-3.00005-1","volume-title":"Metagenomics for Microbiology","author":"Scholz","year":"2015"},{"key":"2023070312004298200_btad376-B20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-020-15346-1","article-title":"Generating high quality libraries for DIA MS with empirically corrected peptide predictions","volume":"11","author":"Searle","year":"2020","journal-title":"Nat Commun"},{"key":"2023070312004298200_btad376-B21","doi-asserted-by":"crossref","first-page":"e82981","DOI":"10.1371\/journal.pone.0082981","article-title":"Evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture","volume":"8","author":"Tanca","year":"2013","journal-title":"PLoS ONE"},{"key":"2023070312004298200_btad376-B22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40168-014-0049-2","article-title":"A straightforward and efficient analytical pipeline for metaproteome characterization","volume":"2","author":"Tanca","year":"2014","journal-title":"Microbiome"},{"key":"2023070312004298200_btad376-B23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40168-017-0293-3","article-title":"Potential and active functions in the gut microbiota of a healthy human cohort","volume":"5","author":"Tanca","year":"2017","journal-title":"Microbiome"},{"key":"2023070312004298200_btad376-B24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-021-27542-8","article-title":"Critical assessment of metaproteome investigation (CAMPI): a multi-laboratory comparison of established workflows","volume":"12","author":"Van Den Bossche","year":"2021","journal-title":"Nat Commun"},{"key":"2023070312004298200_btad376-B25","doi-asserted-by":"crossref","first-page":"100076","DOI":"10.1016\/j.mcpro.2021.100076","article-title":"Spectral prediction features as a solution for the search space size problem in proteogenomics","volume":"20","author":"Verbruggen","year":"2021","journal-title":"Mol Cell Proteomics"},{"key":"2023070312004298200_btad376-B26","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1002\/mas.21543","article-title":"Anatomy and evolution of database search engines\u2014a central component of mass spectrometry based proteomic workflows","volume":"39","author":"Verheggen","year":"2020","journal-title":"Mass Spectrom Rev"},{"key":"2023070312004298200_btad376-B27","doi-asserted-by":"crossref","first-page":"2000002","DOI":"10.1002\/pmic.202000002","article-title":"A fast and memory-efficient spectral library search algorithm using locality-sensitive hashing","volume":"20","author":"Wang","year":"2020","journal-title":"Proteomics"},{"key":"2023070312004298200_btad376-B28","doi-asserted-by":"crossref","first-page":"911","DOI":"10.1111\/j.1462-2920.2004.00687.x","article-title":"The application of two-dimensional polyacrylamide gel electrophoresis and downstream analyses to a mixed community of prokaryotic microorganisms","volume":"6","author":"Wilmes","year":"2004","journal-title":"Environ Microbiol"},{"key":"2023070312004298200_btad376-B29","doi-asserted-by":"crossref","first-page":"176","DOI":"10.4172\/jpb.1000404","article-title":"Decoypyrat: fast non-redundant hybrid decoy sequence generation for large scale proteomics","volume":"9","author":"Wright","year":"2016","journal-title":"J Proteomics Bioinform"},{"key":"2023070312004298200_btad376-B30","doi-asserted-by":"crossref","first-page":"12690","DOI":"10.1021\/acs.analchem.7b02566","article-title":"pdeep: predicting MS\/MS spectra of peptides with deep learning","volume":"89","author":"Zhou","year":"2017","journal-title":"Anal Chem"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad376\/50563067\/btad376.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/6\/btad376\/50784279\/btad376.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/6\/btad376\/50784279\/btad376.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,3]],"date-time":"2023-07-03T08:01:29Z","timestamp":1688371289000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad376\/7192987"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,6,1]]},"references-count":30,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,6,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad376","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.09.09.507252","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,6,1]]},"published":{"date-parts":[[2023,6,1]]},"article-number":"btad376"}}