{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T09:47:14Z","timestamp":1763977634951},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"22","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,11,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Tandem mass spectrometry allows for high-throughput identification of complex protein samples. Searching tandem mass spectra against sequence databases is the main analysis method nowadays. Since many peptide variations are possible, including them in the search space seems only logical. However, the search space usually grows exponentially with the number of independent variations and may therefore overwhelm computational resources.<\/jats:p><jats:p>Results: We provide fast, cache-efficient search algorithms to screen large peptide search spaces including non-tryptic peptides, whole genomes, dozens of posttranslational modifications, unannotated point mutations and even unannotated splice sites. All these search spaces can be screened simultaneously. By optimizing the cache usage, we achieve a calculation speed that closely approaches the limits of the hardware. At the same time, we control the size of the overall search space by limiting the combinations of variations that can co-occur on the same peptide. Using a hypergeometric scoring scheme, we applied these algorithms to a dataset of 1 420 632 spectra. We were able to identify a considerable number of peptide variations within a modest amount of computing time on standard desktop computers.<\/jats:p><jats:p>Availability: PepSplice is available as a C++ application for Linux, Windows and OSX at www.ti.inf.ethz.ch\/pw\/software\/pepsplice\/. It is open source under the revised BSD license.<\/jats:p><jats:p>Contact: \u00a0franz.roos@alumni.ethz.ch or jacob@in.tum.de<\/jats:p><jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btm417","type":"journal-article","created":{"date-parts":[[2007,9,4]],"date-time":"2007-09-04T00:25:29Z","timestamp":1188865529000},"page":"3016-3023","source":"Crossref","is-referenced-by-count":28,"title":["PepSplice: cache-efficient search algorithms for comprehensive identification of tandem mass spectra"],"prefix":"10.1093","volume":"23","author":[{"given":"Franz F.","family":"Roos","sequence":"first","affiliation":[{"name":"1 Institute of Theoretical Computer Science, 2Institute of Plant Science, 3Institute of Computational Science, ETH Zurich, CH-8092 Zurich, Switzerland and 4Institut fuer Informatik, TU Munich, D-85748 Garching, Germany"}]},{"given":"Riko","family":"Jacob","sequence":"additional","affiliation":[{"name":"1 Institute of Theoretical Computer Science, 2Institute of Plant Science, 3Institute of Computational Science, ETH Zurich, CH-8092 Zurich, Switzerland and 4Institut fuer Informatik, TU Munich, D-85748 Garching, Germany"}]},{"given":"Jonas","family":"Grossmann","sequence":"additional","affiliation":[{"name":"1 Institute of Theoretical Computer Science, 2Institute of Plant Science, 3Institute of Computational Science, ETH Zurich, CH-8092 Zurich, Switzerland and 4Institut fuer Informatik, TU Munich, D-85748 Garching, Germany"}]},{"given":"Bernd","family":"Fischer","sequence":"additional","affiliation":[{"name":"1 Institute of Theoretical Computer Science, 2Institute of Plant Science, 3Institute of Computational Science, ETH Zurich, CH-8092 Zurich, Switzerland and 4Institut fuer Informatik, TU Munich, D-85748 Garching, Germany"}]},{"given":"Joachim M.","family":"Buhmann","sequence":"additional","affiliation":[{"name":"1 Institute of Theoretical Computer Science, 2Institute of Plant Science, 3Institute of Computational Science, ETH Zurich, CH-8092 Zurich, Switzerland and 4Institut fuer Informatik, TU Munich, D-85748 Garching, Germany"}]},{"given":"Wilhelm","family":"Gruissem","sequence":"additional","affiliation":[{"name":"1 Institute of Theoretical Computer Science, 2Institute of Plant Science, 3Institute of Computational Science, ETH Zurich, CH-8092 Zurich, Switzerland and 4Institut fuer Informatik, TU Munich, D-85748 Garching, Germany"}]},{"given":"Sacha","family":"Baginsky","sequence":"additional","affiliation":[{"name":"1 Institute of Theoretical Computer Science, 2Institute of Plant Science, 3Institute of Computational Science, ETH Zurich, CH-8092 Zurich, Switzerland and 4Institut fuer Informatik, TU Munich, D-85748 Garching, Germany"}]},{"given":"Peter","family":"Widmayer","sequence":"additional","affiliation":[{"name":"1 Institute of Theoretical Computer Science, 2Institute of Plant Science, 3Institute of Computational Science, ETH Zurich, CH-8092 Zurich, Switzerland and 4Institut fuer Informatik, TU Munich, D-85748 Garching, Germany"}]}],"member":"286","published-online":{"date-parts":[[2007,9,3]]},"reference":[{"key":"2023041208262496900_","first-page":"31","article-title":"The input output complexity of sorting and related problems","author":"Aggarwal","year":"1988","journal-title":"Commun. ACM"},{"key":"2023041208262496900_","doi-asserted-by":"crossref","DOI":"10.1074\/mcp.M600469-MCP200","article-title":"Comparative evaluation of tandem ms search algorithms using a target-decoy search strategy","author":"Balgley","year":"2007","journal-title":"Mol. Cell Proteomics"},{"key":"2023041208262496900_","doi-asserted-by":"crossref","DOI":"10.1145\/369133.369176","article-title":"Gene-finding via tandem mass spectrometry. In","author":"Chen","year":"2001"},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1002\/1615-9861(200104)1:5<651::AID-PROT651>3.0.CO;2-N","article-title":"Interrogating the human genome using uninterpreted mass spectrometry data","volume":"1","author":"Choudhary","year":"2001","journal-title":"Proteomics"},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"1977","DOI":"10.1002\/pmic.200300708","article-title":"High-performance peptide identification by tandem mass spectrometry allows reliable automatic data processing in proteomics","volume":"4","author":"Colinge","year":"2004","journal-title":"Proteomics"},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1021\/pr049811i","article-title":"Experiments in searching small proteins in unannotated large eukaryotic genomes","volume":"4","author":"Colinge","year":"2005","journal-title":"J. Proteome Res."},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"2310","DOI":"10.1002\/rcm.1198","article-title":"A method for reducing the time required to match protein sequences with tandem mass spectra","volume":"17","author":"Craig","year":"2003","journal-title":"Rapid Commun. Mass Spectrom."},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"1466","DOI":"10.1093\/bioinformatics\/bth092","article-title":"Tandem: matching proteins with tandem mass spectra","volume":"20","author":"Craig","year":"2004","journal-title":"Bioinformatics"},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"212","DOI":"10.1126\/science.1124619","article-title":"Mass spectrometry and protein analysis","volume":"312","author":"Domon","year":"2006","journal-title":"Science"},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1038\/nmeth1019","article-title":"Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry","volume":"4","author":"Elias","year":"2007","journal-title":"Nat. Methods"},{"key":"2023041208262496900_","first-page":"667","article-title":"Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations","volume":"2","author":"Elias","year":"2005","journal-title":"Nat. Mehods"},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"976","DOI":"10.1016\/1044-0305(94)80016-2","article-title":"An approach to correlate tandem mass-spectral data of peptides with amino-acid-sequences in a protein database","volume":"5","author":"Eng","year":"1994","journal-title":"J. Am. Soc. Mass Spectrom."},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"e132","DOI":"10.1093\/bioinformatics\/btl219","article-title":"Semi-supervised lc\/ms alignment for differential proteomics","volume":"22","author":"Fischer","year":"2006","journal-title":"Bioinformatics"},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"3742","DOI":"10.1093\/nar\/gkg586","article-title":"Eugene's hom: a generic similarity-based gene finder using multiple homologous sequences","volume":"31","author":"Foissac","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023041208262496900_","article-title":"Cache-oblivious algorithms","author":"Frigo","year":"1999","journal-title":"Master Thesis"},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"3475","DOI":"10.1002\/pmic.200500126","article-title":"An evaluation, comparison, and accurate benchmarking of several publicly available ms\/ms search algorithms: sensitivity and specificity analysis","volume":"5","author":"Kapp","year":"2005","journal-title":"Proteomics"},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"5383","DOI":"10.1021\/ac025747h","article-title":"Empirical statistical model to estimate the accuracy of peptide identifications made by ms\/ms and database search","volume":"74","author":"Keller","year":"2002","journal-title":"Anal. Chem."},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"641","DOI":"10.1002\/1615-9861(200104)1:5<641::AID-PROT641>3.0.CO;2-R","article-title":"Mass spectrometry allows direct identification of proteins in large genomes","volume":"1","author":"Kuster","year":"2001","journal-title":"Proteomics"},{"key":"2023041208262496900_","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btl379","article-title":"General framework for developing and evaluating database scoring algorithms using the tandem search engine","author":"Maclean","year":"2006","journal-title":"Bioinformatics"},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"4103","DOI":"10.1093\/nar\/gkf543","article-title":"Current methods of gene prediction, their strengths and weaknesses","volume":"30","author":"Mathe","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023041208262496900_","first-page":"5","article-title":"Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides","author":"Nesvizhskii","year":"2006","journal-title":"Mol. Cell Proteomics"},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"3551","DOI":"10.1002\/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2","article-title":"Probability-based protein identification by searching sequence databases using mass spectrometry data","volume":"20","author":"Perkins","year":"1999","journal-title":"Electrophoresis"},{"key":"2023041208262496900_","article-title":"Peptide assignment validation: telling what's wrong without actually knowing what's right","author":"Pletscher","year":"2006","journal-title":"Semester Thesis"},{"key":"2023041208262496900_","unstructured":"Roos FF Algorithms for peptide identification by tandem mass spectrometry Ph.D Thesis 16844 2006 ETH Zurich, http:\/\/e-collection.ethbib.ethz.ch\/ecol-pool\/diss\/fulltext\/eth16844.pdf"},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"3792","DOI":"10.1021\/ac034157w","article-title":"A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases","volume":"75","author":"Sadygov","year":"2003","journal-title":"Anal. Chem"},{"key":"2023041208262496900_","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1038\/nmeth725","article-title":"Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book","volume":"1","author":"Sadygov","year":"2004","journal-title":"Nat. Methods"},{"key":"2023041208262496900_","first-page":"67","article-title":"Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases","author":"Yates","year":"1995","journal-title":"Anal. Chem"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/22\/3016\/49857962\/bioinformatics_23_22_3016.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/22\/3016\/49857962\/bioinformatics_23_22_3016.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,14]],"date-time":"2023-05-14T00:40:32Z","timestamp":1684024832000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/22\/3016\/207269"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,9,3]]},"references-count":27,"journal-issue":{"issue":"22","published-print":{"date-parts":[[2007,11,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btm417","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2007,11,15]]},"published":{"date-parts":[[2007,9,3]]}}}