{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,7]],"date-time":"2026-01-07T22:41:13Z","timestamp":1767825673097,"version":"3.49.0"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,5,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Modern proteomics studies utilize high-throughput mass spectrometers which can produce data at an astonishing rate. These big mass spectrometry (MS) datasets can easily reach peta-scale level creating storage and analytic problems for large-scale systems biology studies. Each spectrum consists of thousands of peaks which have to be processed to deduce the peptide. However, only a small percentage of peaks in a spectrum are useful for peptide deduction as most of the peaks are either noise or not useful for a given spectrum. This redundant processing of non-useful peaks is a bottleneck for streaming high-throughput processing of big MS data. One way to reduce the amount of computation required in a high-throughput environment is to eliminate non-useful peaks. Existing noise removing algorithms are limited in their data-reduction capability and are compute intensive making them unsuitable for big data and high-throughput environments. In this paper we introduce a novel low-complexity technique based on classification, quantization and sampling of MS peaks.<\/jats:p>\n               <jats:p>Results: We present a novel data-reductive strategy for analysis of Big MS data. Our algorithm, called MS-REDUCE, is capable of eliminating noisy peaks as well as peaks that do not contribute to peptide deduction before any peptide deduction is attempted. Our experiments have shown up to 100\u00d7 speed up over existing state of the art noise elimination algorithms while maintaining comparable high quality matches. Using our approach we were able to process a million spectra in just under an hour on a moderate server.<\/jats:p>\n               <jats:p>Availability and implementation: The developed tool and strategy has been made available to wider proteomics and parallel computing community and the code can be found at https:\/\/github.com\/pcdslab\/MSREDUCE<\/jats:p>\n               <jats:p>Contact: fahad.saeed@wmich.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw023","type":"journal-article","created":{"date-parts":[[2016,2,15]],"date-time":"2016-02-15T01:09:07Z","timestamp":1455498547000},"page":"1518-1526","source":"Crossref","is-referenced-by-count":24,"title":["MS-REDUCE: an ultrafast technique for reduction of big mass spectrometry data for high-throughput processing"],"prefix":"10.1093","volume":"32","author":[{"given":"Muaaz Gul","family":"Awan","sequence":"first","affiliation":[{"name":"1 Department of Electrical and Computer Engineering and"}]},{"given":"Fahad","family":"Saeed","sequence":"additional","affiliation":[{"name":"1 Department of Electrical and Computer Engineering and"},{"name":"2 Department of Computer Science, Western Michigan University, Kalamazoo, MI 49008, USA"}]}],"member":"286","published-online":{"date-parts":[[2016,1,21]]},"reference":[{"key":"2023020112235868800_btw023-B1","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1038\/nature01511","article-title":"Mass spectrometry-based proteomics","volume":"422","author":"Aebersold","year":"2003","journal-title":"Nature"},{"key":"2023020112235868800_btw023-B3","author":"Awan","year":"2015"},{"key":"2023020112235868800_btw023-B4","doi-asserted-by":"crossref","first-page":"i49","DOI":"10.1093\/bioinformatics\/bth947","article-title":"Automatic quality assessment of peptide tandem mass spectra","volume":"20","author":"Bern","year":"2004","journal-title":"Bioinformatics"},{"key":"2023020112235868800_btw023-B5","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1089\/106652799318300","article-title":"De novo peptide sequencing via tandem mass spectrometry","volume":"6","author":"Dancik","year":"1999","journal-title":"J. Comput. Biol"},{"key":"2023020112235868800_btw023-B6","doi-asserted-by":"crossref","first-page":"3871","DOI":"10.1021\/pr101196n","article-title":"Faster sequest searching for peptide identification from tandem mass spectra","volume":"10","author":"Diament","year":"2011","journal-title":"J. Proteome Res"},{"key":"2023020112235868800_btw023-B7","doi-asserted-by":"crossref","DOI":"10.1186\/1477-5956-7-9","article-title":"A novel approach to denoising ion trap tandem mass spectra","volume":"7","author":"Ding","year":"2009","journal-title":"Proteome Sci"},{"key":"2023020112235868800_btw023-B8","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1504\/IJDMB.2011.038578","article-title":"Svm-rfe based feature selection for tandem mass spectrum quality assessment","volume":"5","author":"Ding","year":"2011","journal-title":"Int. J. Data Min. Bioinf"},{"key":"2023020112235868800_btw023-B9","doi-asserted-by":"crossref","first-page":"2195","DOI":"10.1021\/pr070510t","article-title":"Linear discriminant analysis-based estimation of the false discovery rate for phosphopeptide identifications","volume":"7","author":"Du","year":"2008","journal-title":"J. Proteome Res"},{"key":"2023020112235868800_btw023-B10","doi-asserted-by":"crossref","first-page":"976","DOI":"10.1016\/1044-0305(94)80016-2","article-title":"An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database","volume":"5","author":"Eng","year":"1994","journal-title":"J. Am. Soc. Mass Spectrom"},{"key":"2023020112235868800_btw023-B11","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1002\/bmb.2004.494032020331","article-title":"An introduction to mass spectrometry applications in biological research","volume":"32","author":"Finehout","year":"2003","journal-title":"Biochem. Mol. Biol. Educ"},{"key":"2023020112235868800_btw023-B12","doi-asserted-by":"crossref","DOI":"10.1002\/pmic.200300486","article-title":"Preprocessing of tandem mass spectrometric data to support automatic protein identification","volume":"3","author":"Gentzel","year":"2003","journal-title":"Proteomics"},{"key":"2023020112235868800_btw023-B13","doi-asserted-by":"crossref","first-page":"435","DOI":"10.1021\/ac0258913","article-title":"Intensity-based statistical scorer for tandem mass spectrometry","volume":"75","author":"Havilio","year":"2003","journal-title":"Anal. Chem"},{"key":"2023020112235868800_btw023-B2","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1074\/mcp.M113.034769","article-title":"The one hour yeast proteome","volume":"13","author":"Hebert","year":"2014","journal-title":"Mol. Cell Proteomics"},{"key":"2023020112235868800_btw023-B14","doi-asserted-by":"crossref","first-page":"7159","DOI":"10.1073\/pnas.0600895103","article-title":"Quantitative phosphoproteomics of vasopressin-sensitive renal cells: regulation of aquaporin-2 phosphorylation at two sites","volume":"103","author":"Hoffert","year":"2006","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020112235868800_btw023-B15","doi-asserted-by":"crossref","first-page":"6168","DOI":"10.1021\/ac100975t","article-title":"Classification filtering strategy to improve the coverage and sensitivity of phosphoproteome analysis","volume":"82","author":"Jiang","year":"2010","journal-title":"Anal. Chem"},{"key":"2023020112235868800_btw023-B17","first-page":"923","article-title":"Semi-supervised learning for peptide identification from shotgun proteomics datasets","volume":"4","author":"Kall","year":"2007","journal-title":"J. Proteome Res"},{"key":"2023020112235868800_btw023-B18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1477-5956-10-S1-S12","article-title":"An unsupervised machine learning method for assessing quality of tandem mass spectra","volume":"10","author":"Lin","year":"2012","journal-title":"Proteome Sci"},{"key":"2023020112235868800_btw023-B16","first-page":"1","article-title":"Toxicological screening and quantitation using liquid chromatography\/time-of-flight mass spectrometry","volume":"1","author":"Linnet","year":"2013","journal-title":"J. Foren. Sci. Criminol"},{"key":"2023020112235868800_btw023-B19","doi-asserted-by":"crossref","first-page":"5117","DOI":"10.1002\/pmic.200500928","article-title":"Cleaning of raw peptide ms\/ms spectra: Improved protein identification following deconvolution of multiply charged peaks, isotope clusters, and removal of background noise","volume":"6","author":"Mujezinovic","year":"2006","journal-title":"Proteome Sci"},{"key":"2023020112235868800_btw023-B20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2164-11-S1-S13","article-title":"Reducing the haystack to find the needle: improved protein identification after fast elimination of non-interpretable peptide ms\/ms spectra and noise reduction","volume":"11","author":"Mujezinovic","year":"2010","journal-title":"BMC Genomics"},{"key":"2023020112235868800_btw023-B21","article-title":"Quality assessment of tandem mass spectra based on cumulative intensity normalization","volume":"5","author":"Na","year":"2007","journal-title":"J. Proteome Res"},{"key":"2023020112235868800_btw023-B22","doi-asserted-by":"crossref","first-page":"3022","DOI":"10.1021\/pr800127y","article-title":"Rapid and accurate peptide identification from tandem mass spectra","volume":"7","author":"Park","year":"2008","journal-title":"J. Proteome Res"},{"key":"2023020112235868800_btw023-B23","doi-asserted-by":"crossref","first-page":"3551","DOI":"10.1002\/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2","article-title":"Probabioity-based protein idenitification by searching sequence database using mass spectrometry data","volume":"20","author":"Perkins","year":"1999","journal-title":"Electrophoresis"},{"key":"2023020112235868800_btw023-B24","doi-asserted-by":"crossref","first-page":"13368","DOI":"10.1073\/pnas.0403453101","article-title":"Identification and proteomic profiling of exosomes in human urine","volume":"101","author":"Pisitkun","year":"2004","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020112235868800_btw023-B25","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1089\/omi.2004.8.255","article-title":"Spectral quality assessment for high-throughput tandem mass spectrometry proteomics","volume":"8","author":"Purvine","year":"2004","journal-title":"OMICS: J. Integr. Biol"},{"key":"2023020112235868800_btw023-B26","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1109\/TCBB.2013.152","article-title":"Cams-rs: clustering algorithm for large-scale mass spectrometry data using restricted search space and intelligent random sampling","volume":"11","author":"Saeed","year":"2013","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinf"},{"key":"2023020112235868800_btw023-B27","first-page":"618","article-title":"An efficient dynamic programming algorithm for phosphorylation site assignment of large-scale mass spectrometry data","volume":"7","author":"Saeed","year":"2013","journal-title":"IEEE Int. Conf. Bioinf. Biomed. Workshops (BIBMW)"},{"key":"2023020112235868800_btw023-B28","doi-asserted-by":"crossref","first-page":"S14","DOI":"10.1186\/1477-5956-11-S1-S14","article-title":"Phossa: fast and accurate phosphorylation site assignment algorithm for mass spectrometry data","volume":"11","author":"Saeed","year":"2013","journal-title":"Proteome Sci"},{"key":"2023020112235868800_btw023-B29","doi-asserted-by":"crossref","DOI":"10.1021\/ac026424o","article-title":"Similarity among tandem mass spectra from proteomic experiments: Detection, significance, and utility","volume":"75","author":"Tabb","year":"2003","journal-title":"Anal. Chem"},{"key":"2023020112235868800_btw023-B30","author":"Wells","year":"2011"},{"key":"2023020112235868800_btw023-B31","first-page":"140","article-title":"An approach to assessing peptide mass spectral quality without prior information","volume":"1","author":"Wu","year":"2008","journal-title":"Int. J. Funct. Inf. Person. Med"},{"key":"2023020112235868800_btw023-B32","doi-asserted-by":"crossref","first-page":"1203","DOI":"10.1002\/rcm.3488","article-title":"Peakselect: preprocessing tandem mass spectra for better peptide identification","volume":"22","author":"Zhang","year":"2008","journal-title":"Rapid Commun. Mass Spectrom"},{"key":"2023020112235868800_btw023-B33","doi-asserted-by":"crossref","first-page":"3299","DOI":"10.1002\/pmic.201200189","article-title":"Cphos: a program to calculate and visualize evolutionarily conserved functional phosphorylation sites","volume":"12","author":"Zhao","year":"2012","journal-title":"Proteomics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/10\/1518\/49019095\/bioinformatics_32_10_1518.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/10\/1518\/49019095\/bioinformatics_32_10_1518.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T22:29:55Z","timestamp":1675290595000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/10\/1518\/1743195"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,1,21]]},"references-count":33,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2016,5,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw023","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,5,15]]},"published":{"date-parts":[[2016,1,21]]}}}