{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T09:10:12Z","timestamp":1778663412773,"version":"3.51.4"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2006,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: We reformulate the problem of comparing mass-spectra by mapping spectra to a vector space model. Our search method leverages a metric space indexing algorithm to produce an initial candidate set, which can be followed by any fine ranking scheme.<\/jats:p>\n               <jats:p>Results: We consider three distance measures integrated into a multi-vantage point index structure. Of these, a semi-metric fuzzy-cosine distance using peptide precursor mass constraints performs the best. The index acts as a coarse, lossless filter with respect to the SEQUEST and ProFound scoring schemes, reducing the number of distance computations and returned candidates for fine filtering to about 0.5% and 0.02% of the database respectively. The fuzzy cosine distance term improves specificity over a peptide precursor mass filter, reducing the number of returned candidates by an order of magnitude. Run time measurements suggest proportional speedups in overall search times. Using an implementation of ProFound's Bayesian score as an example of a fine filter on a test set of Escherichia coli protein fragmentation spectra, the top results of our sample system are consistent with that of SEQUEST.<\/jats:p>\n               <jats:p>Contact: \u00a0smriti@cs.utexas.edu<\/jats:p>\n               <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btl118","type":"journal-article","created":{"date-parts":[[2006,4,4]],"date-time":"2006-04-04T00:23:52Z","timestamp":1144110232000},"page":"1524-1531","source":"Crossref","is-referenced-by-count":29,"title":["A fast coarse filtering method for peptide identification by mass spectrometry"],"prefix":"10.1093","volume":"22","author":[{"given":"Smriti R.","family":"Ramakrishnan","sequence":"first","affiliation":[{"name":"Department of Computer Sciences, The University of Texas at Austin 1 \u00a0 1 \u00a0 \u00a0 Austin, Texas 78712, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rui","family":"Mao","sequence":"additional","affiliation":[{"name":"Department of Computer Sciences, The University of Texas at Austin 1 \u00a0 1 \u00a0 \u00a0 Austin, Texas 78712, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aleksey A.","family":"Nakorchevskiy","sequence":"additional","affiliation":[{"name":"Department of Chemistry and Biochemistry, The University of Texas at Austin 3 \u00a0 3 \u00a0 \u00a0 Austin, Texas 78712, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"John T.","family":"Prince","sequence":"additional","affiliation":[{"name":"Institute for Cellular and Molecular Biology, The University of Texas at Austin 2 \u00a0 2 \u00a0 \u00a0 Austin, Texas 78712, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Willard S.","family":"Willard","sequence":"additional","affiliation":[{"name":"Department of Computer Sciences, The University of Texas at Austin 1 \u00a0 1 \u00a0 \u00a0 Austin, Texas 78712, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weijia","family":"Xu","sequence":"additional","affiliation":[{"name":"Department of Computer Sciences, The University of Texas at Austin 1 \u00a0 1 \u00a0 \u00a0 Austin, Texas 78712, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Edward M.","family":"Marcotte","sequence":"additional","affiliation":[{"name":"Institute for Cellular and Molecular Biology, The University of Texas at Austin 2 \u00a0 2 \u00a0 \u00a0 Austin, Texas 78712, USA"},{"name":"Department of Chemistry and Biochemistry, The University of Texas at Austin 3 \u00a0 3 \u00a0 \u00a0 Austin, Texas 78712, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel P.","family":"Miranker","sequence":"additional","affiliation":[{"name":"Department of Computer Sciences, The University of Texas at Austin 1 \u00a0 1 \u00a0 \u00a0 Austin, Texas 78712, USA"},{"name":"Institute for Cellular and Molecular Biology, The University of Texas at Austin 2 \u00a0 2 \u00a0 \u00a0 Austin, Texas 78712, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2006,4,4]]},"reference":[{"key":"2023012408402423100_b1","first-page":"357","article-title":"Distance-based indexing for high-dimensional metric spaces","author":"Bozkaya","year":"1997"},{"key":"2023012408402423100_b2","doi-asserted-by":"crossref","first-page":"619","DOI":"10.1002\/pmic.200300612","article-title":"Evaluation of algorithms for protein identification from sequence databases using mass spectrometry data","volume":"4","author":"Chamrad","year":"2004","journal-title":"Proteomics"},{"key":"2023012408402423100_b3","first-page":"481","article-title":"Aligning two sequences within a specified diagonal band","volume":"8","author":"Chao","year":"1992","journal-title":"Comput. Appl. Biosci."},{"key":"2023012408402423100_b4","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1145\/502807.502808","article-title":"Searching in metric spaces","volume":"33","author":"Chavez","year":"2001","journal-title":"ACM Comp. Surv."},{"key":"2023012408402423100_b5","first-page":"75","article-title":"Computational geometry: a retrospective","author":"Chazelle","year":"1994"},{"key":"2023012408402423100_b6","doi-asserted-by":"crossref","first-page":"2871","DOI":"10.1021\/ac9810516","article-title":"Role of accurate mass measurement (\u00b1 10 p.p.m) in protein identification strategies employing MS or MS\/MS and database searching","volume":"71","author":"Clauser","year":"1999","journal-title":"Anal. Chem."},{"key":"2023012408402423100_b7","article-title":"A Survey of Information Retrieval and Filtering Methods","volume-title":"Technical Report","author":"Faloutsos","year":"1996"},{"key":"2023012408402423100_b8","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1007\/978-3-662-03493-4_4","article-title":"The importance of co- and post-translational modifications in proteome projects","volume-title":"Proteome Research: New Frontiers in Functional Genomics","author":"Gooley","year":"1997"},{"key":"2023012408402423100_b9","first-page":"5383","article-title":"Empirical statistical model to estimate the accuracy of peptide identifications made by ms\/ms and database search","volume-title":"Anal. Chem.","author":"Keller","year":"2002"},{"key":"2023012408402423100_b10","first-page":"406","article-title":"Exact indexing of dynamic time warping","author":"Keogh","year":"2002"},{"key":"2023012408402423100_b11","doi-asserted-by":"crossref","first-page":"4390","DOI":"10.1021\/ac00096a002","article-title":"Error-tolerant identification of peptides in sequence databases by peptide sequence tags","volume":"66","author":"Mann","year":"1994","journal-title":"Anal. Chem."},{"key":"2023012408402423100_b12","first-page":"351","article-title":"On optimizing distance-based similarity search for biological databases","author":"Mao","year":"2005"},{"key":"2023012408402423100_b13","first-page":"241","article-title":"Mobios: a metric-space dbms to support biological discovery","author":"Miranker","year":"2003"},{"key":"2023012408402423100_b14","doi-asserted-by":"crossref","first-page":"4646","DOI":"10.1021\/ac0341261","article-title":"A statistical model for identifying proteins by tandem mass spectrometry","volume":"75","author":"Nesvizhskii","year":"2003","journal-title":"Anal. Chem."},{"key":"2023012408402423100_b15","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1016\/0960-9822(93)90195-T","article-title":"Rapid identification of proteins by peptide-mass fingerprinting","volume":"3","author":"Pappin","year":"1993","journal-title":"Curr. Biol."},{"key":"2023012408402423100_b16","doi-asserted-by":"crossref","first-page":"3551","DOI":"10.1002\/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2","article-title":"Probability-based protein identification by searching sequence databases using mass spectrometry data","volume":"20","author":"Perkins","year":"1999","journal-title":"Electrophoresis"},{"key":"2023012408402423100_b17","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1101\/gr.154101","article-title":"Efficiency of database search for identification of mutated and modified proteins via mass spectrometry","volume":"11","author":"Pevzner","year":"2001","journal-title":"Genome Res."},{"key":"2023012408402423100_b18","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1038\/nbt0404-471","article-title":"The need for a public proteomics repository","volume":"22","author":"Prince","year":"2004","journal-title":"Nat. Biotechnol."},{"key":"2023012408402423100_b19","volume-title":"Database Management Systems","author":"Ramakrishnan","year":"2002"},{"key":"2023012408402423100_b20","first-page":"125","article-title":"Distance based indexing for string proximity search","author":"Sahinalp","year":"2003"},{"key":"2023012408402423100_b21","doi-asserted-by":"crossref","DOI":"10.1109\/TASSP.1978.1163055","article-title":"A dynamic programming algorithm optimization for spoken word recognition","volume":"26","author":"Sakoe","year":"1978","journal-title":"IEEE Trans. Acoustics Speech Signal Proc."},{"key":"2023012408402423100_b22","volume-title":"Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison","author":"Sankoff","year":"1983"},{"key":"2023012408402423100_b23","first-page":"183","article-title":"Metric indexing for the vector model in text retrieval","author":"Skopal","year":"2004"},{"key":"2023012408402423100_b24","first-page":"426","article-title":"Tools and techniques for color image retrieval","author":"Smith","year":"1996"},{"key":"2023012408402423100_b25","first-page":"389","article-title":"Cafe: an indexed approach to searching genomic databases","author":"Williams","year":"1998"},{"key":"2023012408402423100_b26","article-title":"Indexing protein sequences in metric space","volume-title":"Technical report","author":"Xu","year":"2003"},{"key":"2023012408402423100_b27","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1093\/bioinformatics\/bth929","article-title":"Using mobios' scalable genome join to find conserved primer pair candidates between two genomes","volume":"20","author":"Xu","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012408402423100_b28","doi-asserted-by":"crossref","first-page":"1426","DOI":"10.1021\/ac00104a020","article-title":"Method to correlate tandem mass spectral data of modified peptides to amino acid sequences in the protein database","volume":"67","author":"Yates","year":"1995","journal-title":"Anal. Chem."},{"key":"2023012408402423100_b29","doi-asserted-by":"crossref","first-page":"2482","DOI":"10.1021\/ac991363o","article-title":"ProFound\u2014an expert system for protein identification using mass spectrometric peptide mapping information","volume":"72","author":"Zhang","year":"2000","journal-title":"Anal. Chem."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/12\/1524\/48838207\/bioinformatics_22_12_1524.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/12\/1524\/48838207\/bioinformatics_22_12_1524.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,24]],"date-time":"2023-01-24T08:55:12Z","timestamp":1674550512000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/22\/12\/1524\/207367"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,4,4]]},"references-count":29,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2006,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btl118","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2006,6,15]]},"published":{"date-parts":[[2006,4,4]]}}}