{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:15Z","timestamp":1772138055029,"version":"3.50.1"},"reference-count":26,"publisher":"Oxford University Press (OUP)","issue":"Supplement_2","license":[{"start":{"date-parts":[[2022,9,1]],"date-time":"2022-09-01T00:00:00Z","timestamp":1661990400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,16]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Target-decoy competition (TDC) is a commonly used method for false discovery rate (FDR) control in the analysis of tandem mass spectrometry data. This type of competition-based FDR control has recently gained significant popularity in other fields after Barber and Cand\u00e8s laid its theoretical foundation in a more general setting that included the feature selection problem. In both cases, the competition is based on a head-to-head comparison between an (observed) target score and a corresponding decoy (knockoff) score. However, the effectiveness of TDC depends on whether the data are homogeneous, which is often not the case: in many settings, the data consist of groups with different score profiles or different proportions of true nulls. In such cases, applying TDC while ignoring the group structure often yields imbalanced lists of discoveries, where some groups might include relatively many false discoveries and other groups include relatively very few. On the other hand, as we show, the alternative approach of applying TDC separately to each group does not rigorously control the FDR.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We developed Group-walk, a procedure that controls the FDR in the target-decoy\/knockoff setting while taking into account a given group structure. Group-walk is derived from the recently developed AdaPT\u2014a general framework for controlling the FDR with side-information. We show using simulated and real datasets that when the data naturally divide into groups with different characteristics Group-walk can deliver consistent power gains that in some cases are substantial. These groupings include the precursor charge state (4% more discovered peptides at 1% FDR threshold), the peptide length (3.6% increase) and the mass difference due to modifications (26% increase).<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Group-walk is available at https:\/\/cran.r-project.org\/web\/packages\/groupwalk\/index.html.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac471","type":"journal-article","created":{"date-parts":[[2022,8,24]],"date-time":"2022-08-24T13:52:29Z","timestamp":1661349149000},"page":"ii82-ii88","source":"Crossref","is-referenced-by-count":9,"title":["Group-walk: a rigorous approach to group-wise false discovery rate analysis by target-decoy competition"],"prefix":"10.1093","volume":"38","author":[{"given":"Jack","family":"Freestone","sequence":"first","affiliation":[{"name":"School of Mathematics and Statistics F07, University of Sydney , Sydney 2006, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Temana","family":"Short","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics F07, University of Sydney , Sydney 2006, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"William Stafford","family":"Noble","sequence":"additional","affiliation":[{"name":"Department of Genome Sciences, University of Washington , Seattle 98195-4550, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Uri","family":"Keich","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics F07, University of Sydney , Sydney 2006, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,9,18]]},"reference":[{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"1795","DOI":"10.1074\/mcp.M110.000422","article-title":"Improving software performance for peptide electron transfer dissociation data analysis by implementation of charge state- and sequence-dependent scoring","volume":"9","author":"Baker","year":"2010","journal-title":"Mol. Cell. Proteomics"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"2055","DOI":"10.1214\/15-AOS1337","article-title":"Controlling the false discovery rate via knockoffs","volume":"43","author":"Barber","year":"2015","journal-title":"Ann. Stat"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc. Series B"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"2791","DOI":"10.1074\/mcp.M115.055103","article-title":"Systematic errors in peptide and protein identification and quantification by modified peptides","volume":"15","author":"Bogdanow","year":"2016","journal-title":"Mol. Cell. Proteomics"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1111\/rssb.12265","article-title":"Panning for gold: model-X knockoffs for high-dimensional controlled variable selection","author":"Cand\u00e8s","year":"2018","journal-title":"J. R. Stat. Soc. Series B.,"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"2265","DOI":"10.1021\/pr901023v","article-title":"MUDE: a new approach for optimizing sensitivity in the target-decoy search strategy for large-scale peptide\/protein identification","volume":"9","author":"Cerqueira","year":"2010","journal-title":"J. Proteome Res"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1214\/07-AOAS141","article-title":"Simultaneous inference: when should hypothesis testing problems be combined?","volume":"2","author":"Efron","year":"2008","journal-title":"Ann. Appl. Stat"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1038\/nmeth1019","article-title":"Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry","volume":"4","author":"Elias","year":"2007","journal-title":"Nat. Methods"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1007\/978-1-60761-444-9_5","article-title":"Target-decoy search strategy for mass spectrometry-based proteomics","volume":"604","author":"Elias","year":"2010","journal-title":"Methods Mol. Biol"},{"key":"2023041408002531300_","first-page":"54","author":"Emery","year":"2020"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"976","DOI":"10.1016\/1044-0305(94)80016-2","article-title":"An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database","volume":"5","author":"Eng","year":"1994","journal-title":"J. Am. Soc. Mass Spectrom"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"1359","DOI":"10.1074\/mcp.O113.030189","article-title":"Transferred subgroup false discovery rate for rare post-translational modifications detected by mass spectrometry","volume":"13","author":"Fu","year":"2014","journal-title":"Mol. Cell. Proteomics"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1016\/j.jprot.2012.12.007","article-title":"Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics","volume":"80","author":"Granholm","year":"2013","journal-title":"J. Proteomics"},{"key":"2023041408002531300_","article-title":"A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics","author":"He","year":"2015","journal-title":"arXiv"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1002\/mas.20068","article-title":"Automated protein identification by tandem mass spectrometry: issues and strategies","volume":"25","author":"Hernandez","year":"2006","journal-title":"Mass Spectrom. Rev"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1471-2105-13-S16-S2","article-title":"False discovery rates in spectral identification","volume":"13","author":"Jeong","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"1147","DOI":"10.1021\/pr5010983","article-title":"On the importance of well calibrated scores for identifying shotgun proteomics spectra","volume":"14","author":"Keich","year":"2015","journal-title":"J. Proteome Res"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"973","DOI":"10.1080\/01621459.2017.1375931","article-title":"Controlling the FDR in imperfect database matches applied to tandem mass spectrum identification","volume":"113","author":"Keich","year":"2017","journal-title":"J. Am. Stat. Assoc"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"3148","DOI":"10.1021\/acs.jproteome.5b00081","article-title":"Improved false discovery rate estimation procedure for shotgun proteomics","volume":"14","author":"Keich","year":"2015","journal-title":"J. Proteome Res"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"649","DOI":"10.1111\/rssb.12274","article-title":"Adapt: an interactive procedure for multiple testing with side information","volume":"80","author":"Lei","year":"2018","journal-title":"J. R. Stat. Soc. Series B Stat. Methodol"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"2092","DOI":"10.1016\/j.jprot.2010.08.009","article-title":"A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics","volume":"73","author":"Nesvizhskii","year":"2010","journal-title":"J. Proteomics"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"e1002296","DOI":"10.1371\/journal.pcbi.1002296","article-title":"Computational and statistical analysis of protein mass spectrometry data","volume":"8","author":"Noble","year":"2012","journal-title":"PLoS Comput. Biol"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"3022","DOI":"10.1021\/pr800127y","article-title":"Rapid and accurate peptide identification from tandem mass spectra","volume":"7","author":"Park","year":"2008","journal-title":"J. Proteome Res"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"D442","DOI":"10.1093\/nar\/gky1106","article-title":"The PRIDE database and related tools and resources in 2019: improving support for quantification data","volume":"47","author":"Perez-Riverol","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1111\/1467-9868.00346","article-title":"A direct approach to false discovery rates","volume":"64","author":"Storey","year":"2002","journal-title":"J. R. Stat. Soc. Series B"},{"key":"2023041408002531300_","doi-asserted-by":"crossref","first-page":"2461","DOI":"10.1002\/pmic.201500431","article-title":"How to talk about protein-level false discovery rates in shotgun proteomics","volume":"16","author":"The","year":"2016","journal-title":"Proteomics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/Supplement_2\/ii82\/49886292\/btac471.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/Supplement_2\/ii82\/49886292\/btac471.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,2]],"date-time":"2024-10-02T10:09:08Z","timestamp":1727863748000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/Supplement_2\/ii82\/6701992"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,1]]},"references-count":26,"journal-issue":{"issue":"Supplement_2","published-print":{"date-parts":[[2022,9,16]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac471","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.01.30.478144","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,9,1]]},"published":{"date-parts":[[2022,9,1]]}}}