{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T00:36:39Z","timestamp":1774053399897,"version":"3.50.1"},"reference-count":21,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2024,12,4]],"date-time":"2024-12-04T00:00:00Z","timestamp":1733270400000},"content-version":"vor","delay-in-days":6,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,11,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The Basic Local Alignment Search Tool, BLAST, is an indispensable tool for genomic research. BLAST has established itself as the canonical tool for sequence similarity search in large part thanks to its meaningful statistical analysis. Specifically, BLAST reports the E-value of each reported alignment, which is defined as the expected number of optimal local alignments that will score at least as high as the observed alignment score, assuming that the query and the database sequences are randomly generated.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here, we critically evaluate the E-values provided by the standard protein BLAST (blastp), showing that they can be at times significantly conservative while at others too liberal. We offer an alternative approach based on generating a small sample from the null distribution of random optimal alignments, and testing whether the observed alignment score is consistent with it. In contrast with blastp, our significance analysis seems valid, in the sense that it did not deliver inflated significance estimates in any of our extensive experiments. Moreover, although our method is slightly conservative, it is often significantly less so than the blastp E-value. Indeed, in cases where blastp\u2019s analysis is valid (i.e., not too liberal), our approach seems to deliver a greater number of correct alignments. One advantage of our approach is that it works with any reasonable choice of substitution matrix and gap penalties, avoiding blastp\u2019s limited options of matrices and penalties. In addition, we can formulate the problem using a canonical family-wise error rate control setup, thereby dispensing with E-values, which can at times be difficult to interpret.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The Apache licensed source code is available at https:\/\/github.com\/batmen-lab\/SGPvalue.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae729","type":"journal-article","created":{"date-parts":[[2024,12,10]],"date-time":"2024-12-10T18:41:42Z","timestamp":1733856102000},"source":"Crossref","is-referenced-by-count":11,"title":["A BLAST from the past: revisiting blastp\u2019s <i>E<\/i>-value"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1686-5917","authenticated-orcid":false,"given":"Yang Young","family":"Lu","sequence":"first","affiliation":[{"name":"Cheriton School of Computer Science, University of Waterloo , Waterloo, ON N2L 3G1,","place":["Canada"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7283-4715","authenticated-orcid":false,"given":"William Stafford","family":"Noble","sequence":"additional","affiliation":[{"name":"Department of Genome Sciences and Paul G. Allen School of Computer Science and Engineering, University of Washington , Seattle, WA 98105,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3209-5011","authenticated-orcid":false,"given":"Uri","family":"Keich","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, University of Sydney , Camperdown, NSW 2006,","place":["Australia"]}]}],"member":"286","published-online":{"date-parts":[[2024,12,4]]},"reference":[{"key":"2024121804354302600_btae729-B1","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1093\/nar\/29.2.351","article-title":"The estimation of statistical parameters for local alignment score distributions","volume":"29","author":"Altschul","year":"2001","journal-title":"Nucleic Acids Res"},{"key":"2024121804354302600_btae729-B2","first-page":"460","author":"Altschul","year":"1996"},{"key":"2024121804354302600_btae729-B3","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"A basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2024121804354302600_btae729-B4","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2024121804354302600_btae729-B5","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1214\/aoap\/1177005208","article-title":"A phase transition for the score in matching random sequences allowing deletions","volume":"4","author":"Arratia","year":"1994","journal-title":"Ann Appl Probab"},{"key":"2024121804354302600_btae729-B6","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1093\/nar\/28.1.45","article-title":"The swiss-prot protein sequence database and its supplement trembl in 2000","volume":"28","author":"Bairoch","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2024121804354302600_btae729-B7","doi-asserted-by":"crossref","first-page":"1165","DOI":"10.1214\/aos\/1013699998","article-title":"The control of the false discovery rate in multiple testing under dependency","volume":"29","author":"Benjamini","year":"2001","journal-title":"Ann Statist"},{"key":"2024121804354302600_btae729-B8","doi-asserted-by":"crossref","first-page":"6073","DOI":"10.1073\/pnas.95.11.6073","article-title":"Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships","volume":"95","author":"Brenner","year":"1998","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024121804354302600_btae729-B9","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1093\/nar\/28.1.254","article-title":"The ASTRAL compendium for sequence and structure analysis","volume":"28","author":"Brenner","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2024121804354302600_btae729-B10","first-page":"345","article-title":"A model of evolutionary change in proteins","volume":"5","author":"Dayhoff","year":"1978","journal-title":"Atlas Protein Seq Struct"},{"key":"2024121804354302600_btae729-B11","doi-asserted-by":"crossref","first-page":"2039","DOI":"10.1214\/aop\/1176988493","article-title":"Limit distribution of maximal non-aligned two-sequence segmental score","volume":"22","author":"Dembo","year":"1994","journal-title":"Ann Probab"},{"key":"2024121804354302600_btae729-B12","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024121804354302600_btae729-B13","doi-asserted-by":"crossref","first-page":"2264","DOI":"10.1073\/pnas.87.6.2264","article-title":"Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes","volume":"87","author":"Karlin","year":"1990","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024121804354302600_btae729-B14","doi-asserted-by":"crossref","first-page":"571","DOI":"10.1214\/aos\/1176347616","article-title":"Statistical composition of high-scoring segments from molecular sequences","volume":"18","author":"Karlin","year":"1990","journal-title":"Ann Statist"},{"key":"2024121804354302600_btae729-B15","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1186\/s12859-017-1703-z","article-title":"PFASUM: a substitution matrix from Pfam structural alignments","volume":"18","author":"Keul","year":"2017","journal-title":"BMC Bioinfor"},{"key":"2024121804354302600_btae729-B16","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J Mol Biol"},{"key":"2024121804354302600_btae729-B17","doi-asserted-by":"crossref","first-page":"e1004509","DOI":"10.1371\/journal.pcbi.1004509","article-title":"Beyond the e-value: stratified statistics for protein domain prediction","volume":"11","author":"Ochoa","year":"2015","journal-title":"PLoS Comput Biol"},{"key":"2024121804354302600_btae729-B18","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1016\/S0076-6879(96)66017-0","article-title":"Effective protein sequence comparison","volume":"266","author":"Pearson","year":"1996","journal-title":"Methods Enzymol"},{"key":"2024121804354302600_btae729-B19","doi-asserted-by":"crossref","first-page":"2444","DOI":"10.1073\/pnas.85.8.2444","article-title":"Improved tools for biological sequence comparison","volume":"85","author":"Pearson","year":"1988","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024121804354302600_btae729-B20","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1038\/nbt0308-274","article-title":"BLOSUM62 miscalculations improve search performance","volume":"26","author":"Styczynski","year":"2008","journal-title":"Nat Biotechnol"},{"key":"2024121804354302600_btae729-B21","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1186\/1471-2105-12-47","article-title":"Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling","volume":"12","author":"Wolfsheimer","year":"2011","journal-title":"BMC Bioinfor"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae729\/60951326\/btae729.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae729\/61219268\/btae729.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae729\/61219268\/btae729.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,18]],"date-time":"2024-12-18T04:36:02Z","timestamp":1734496562000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae729\/7916501"}},"subtitle":[],"editor":[{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,11,28]]},"references-count":21,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,11,28]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae729","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,12]]},"published":{"date-parts":[[2024,11,28]]},"article-number":"btae729"}}