{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,6,29]],"date-time":"2024-06-29T21:26:41Z","timestamp":1719696401807},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"14","license":[{"start":{"date-parts":[[2016,10,1]],"date-time":"2016-10-01T00:00:00Z","timestamp":1475280000000},"content-version":"vor","delay-in-days":2320,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Since database retrieval is a fundamental operation, the measurement of retrieval efficacy is critical to progress in bioinformatics. This article points out some issues with current methods of measuring retrieval efficacy and suggests some improvements. In particular, many studies have used the pooled receiver operating characteristic for n irrelevant records (ROCn) score, the area under the ROC curve (AUC) of a \u2018pooled\u2019 ROC curve, truncated at n irrelevant records. Unfortunately, the pooled ROCn score does not faithfully reflect actual usage of retrieval algorithms. Additionally, a pooled ROCn score can be very sensitive to retrieval results from as little as a single query.<\/jats:p>\n               <jats:p>Methods: To replace the pooled ROCn score, we propose the Threshold Average Precision (TAP-k), a measure closely related to the well-known average precision in information retrieval, but reflecting the usage of E-values in bioinformatics. Furthermore, in addition to conditions previously given in the literature, we introduce three new criteria that an ideal measure of retrieval efficacy should satisfy.<\/jats:p>\n               <jats:p>Results: PSI-BLAST, GLOBAL, HMMER and RPS-BLAST provided examples of using the TAP-k and pooled ROCn scores to evaluate sequence retrieval algorithms. In particular, compelling examples using real data highlight the drawbacks of the pooled ROCn score, showing that it can produce evaluations skewing far from intuitive expectations. In contrast, the TAP-k satisfies most of the criteria desired in an ideal measure of retrieval efficacy.<\/jats:p>\n               <jats:p>Availability and Implementation: The TAP-k web server and downloadable Perl script are freely available at http:\/\/www.ncbi.nlm.nih.gov\/CBBresearch\/Spouge\/html.ncbi\/tap\/<\/jats:p>\n               <jats:p>Contact: \u00a0spouge@ncbi.nlm.nih.gov<\/jats:p>\n               <jats:p>Supplementary Information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq270","type":"journal-article","created":{"date-parts":[[2010,5,27]],"date-time":"2010-05-27T02:45:03Z","timestamp":1274928303000},"page":"1708-1713","source":"Crossref","is-referenced-by-count":25,"title":["Threshold Average Precision (TAP-<i>k<\/i>): a measure of retrieval designed for bioinformatics"],"prefix":"10.1093","volume":"26","author":[{"given":"Hyrum D.","family":"Carroll","sequence":"first","affiliation":[{"name":"1 National Center for Biotechnology Information, Bethesda, MD 20894 and 2 University of Maryland, Baltimore County, Baltimore, MD 21250, USA"}]},{"given":"Maricel G.","family":"Kann","sequence":"additional","affiliation":[{"name":"1 National Center for Biotechnology Information, Bethesda, MD 20894 and 2 University of Maryland, Baltimore County, Baltimore, MD 21250, USA"}]},{"given":"Sergey L.","family":"Sheetlin","sequence":"additional","affiliation":[{"name":"1 National Center for Biotechnology Information, Bethesda, MD 20894 and 2 University of Maryland, Baltimore County, Baltimore, MD 21250, USA"}]},{"given":"John L.","family":"Spouge","sequence":"additional","affiliation":[{"name":"1 National Center for Biotechnology Information, Bethesda, MD 20894 and 2 University of Maryland, Baltimore County, Baltimore, MD 21250, USA"}]}],"member":"286","published-online":{"date-parts":[[2010,5,26]]},"reference":[{"key":"2023012507582271200_B1","doi-asserted-by":"crossref","first-page":"387","DOI":"10.1016\/0022-2496(75)90001-2","article-title":"Area above ordinal dominance graph and area below receiver operating characteristic graph","volume":"12","author":"Bamber","year":"1975","journal-title":"J. Math. Psychol."},{"key":"2023012507582271200_B2","doi-asserted-by":"crossref","first-page":"D301","DOI":"10.1093\/nar\/gkl971","article-title":"The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data","volume":"35","author":"Berman","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023012507582271200_B3","doi-asserted-by":"crossref","first-page":"6073","DOI":"10.1073\/pnas.95.11.6073","article-title":"Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships","volume":"95","author":"Brenner","year":"1998","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012507582271200_B4","doi-asserted-by":"crossref","first-page":"2456","DOI":"10.1093\/bioinformatics\/btg349","article-title":"Assessing sequence comparison methods with the average precision criterion","volume":"19","author":"Chen","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012507582271200_B5","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1145\/1143844.1143874","article-title":"The Relationship Between Precision-Recall and ROC Curves","volume-title":"Proceedings of the 23rd International Conference on Machine learning.","author":"Davis","year":"2006"},{"key":"2023012507582271200_B6","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile hidden Markov models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"2023012507582271200_B7","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","article-title":"An introduction to ROC analysis","volume":"27","author":"Fawcett","year":"2006","journal-title":"Pattern Recognit. Lett."},{"key":"2023012507582271200_B8","doi-asserted-by":"crossref","first-page":"D281","DOI":"10.1093\/nar\/gkm960","article-title":"The Pfam protein families database","volume":"36","author":"Finn","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012507582271200_B9","doi-asserted-by":"crossref","first-page":"2177","DOI":"10.1093\/nar\/gkp1219","article-title":"Homologous over-extension: a challenge for iterative similarity searches","volume":"38","author":"Gonzalez","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"2023012507582271200_B10","doi-asserted-by":"crossref","first-page":"1834","DOI":"10.1109\/JPROC.2002.805303","article-title":"Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison","volume":"90","author":"Green","year":"2002","journal-title":"Proc. IEEE"},{"key":"2023012507582271200_B11","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/S0097-8485(96)80004-0","article-title":"Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching","volume":"20","author":"Gribskov","year":"1996","journal-title":"Comput. Chem."},{"key":"2023012507582271200_B12","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1007\/s10994-009-5119-5","article-title":"Measuring classifier performance: a coherent alternative to the area under the ROC curve","volume":"77","author":"Hand","year":"2009","journal-title":"Mach. Learn."},{"key":"2023012507582271200_B13","doi-asserted-by":"crossref","first-page":"272","DOI":"10.1186\/1471-2105-6-272","article-title":"Automated methods of predicting the function of biological sequences using GO and BLAST","volume":"6","author":"Jones","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023012507582271200_B14","doi-asserted-by":"crossref","first-page":"4678","DOI":"10.1093\/nar\/gkm414","article-title":"The identification of complete domains within protein sequences using accurate E-values for semi-global alignment","volume":"35","author":"Kann","year":"2007","journal-title":"Nucleic Acids Res."},{"issue":"Suppl. 1","key":"2023012507582271200_B15","doi-asserted-by":"crossref","first-page":"S12","DOI":"10.1186\/1471-2148-7-S1-S12","article-title":"FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function","volume":"7","author":"Krishnamurthy","year":"2007","journal-title":"BMC Evol. Biol."},{"key":"2023012507582271200_B16","first-page":"123","article-title":"Precision-recall operating characteristic (P-ROC) curves in imprecise environments","volume-title":"Proceedings of 18th International Conference on Pattern Recognition","author":"Landgrebe","year":"2006"},{"key":"2023012507582271200_B17","first-page":"185","article-title":"Comparing valuation metrics for sentence boundary detection","author":"Liu","year":"2007","journal-title":"IEEE Int Conf. Acoust. Speech Signal Process."},{"key":"2023012507582271200_B18","doi-asserted-by":"crossref","first-page":"D237","DOI":"10.1093\/nar\/gkl951","article-title":"CDD: a conserved domain database for interactive domain family analysis","volume":"35","author":"Marchler-Bauer","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023012507582271200_B19","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1016\/j.sbi.2005.05.005","article-title":"The limits of protein sequence comparison?","volume":"15","author":"Pearson","year":"2005","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023012507582271200_B20","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1101\/gr.199701","article-title":"Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature","volume":"12","author":"Raychaudhuri","year":"2002","journal-title":"Genome Res."},{"key":"2023012507582271200_B21","doi-asserted-by":"crossref","first-page":"1000","DOI":"10.1093\/bioinformatics\/15.12.1000","article-title":"IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices","volume":"15","author":"Schaffer","year":"1999","journal-title":"Bioinformatics"},{"key":"2023012507582271200_B22","doi-asserted-by":"crossref","first-page":"2994","DOI":"10.1093\/nar\/29.14.2994","article-title":"Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements","volume":"29","author":"Schaffer","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2023012507582271200_B23","doi-asserted-by":"crossref","first-page":"773","DOI":"10.1110\/ps.03328504","article-title":"Sensitivity and selectivity in protein structure comparison","volume":"13","author":"Sierk","year":"2004","journal-title":"Protein Sci."},{"key":"2023012507582271200_B24","doi-asserted-by":"crossref","DOI":"10.21236\/AD0656340","volume-title":"Effectiveness of Information Retrieval Methods.","author":"Swets","year":"1967"},{"key":"2023012507582271200_B25","doi-asserted-by":"crossref","first-page":"1285","DOI":"10.1126\/science.3287615","article-title":"Measuring the accuracy of diagnostic systems","volume":"240","author":"Swets","year":"1988","journal-title":"Science"},{"key":"2023012507582271200_B26","doi-asserted-by":"crossref","first-page":"798","DOI":"10.1093\/bioinformatics\/btn037","article-title":"ConFunc - functional annotation in the twilight zone","volume":"24","author":"Wass","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012507582271200_B27","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1016\/0306-4379(92)90019-J","article-title":"An information measure of retrieval performance","volume":"17","author":"Wilbur","year":"1992","journal-title":"Inf. Syst."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/14\/1708\/48852821\/bioinformatics_26_14_1708.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/14\/1708\/48852821\/bioinformatics_26_14_1708.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T07:58:44Z","timestamp":1674633524000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/14\/1708\/178241"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,5,26]]},"references-count":27,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2010,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq270","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,7,15]]},"published":{"date-parts":[[2010,5,26]]}}}