{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,12,17]],"date-time":"2023-12-17T12:31:05Z","timestamp":1702816265642},"reference-count":16,"publisher":"Oxford University Press (OUP)","issue":"23","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1128,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2013,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Sequence similarity searches performed with BLAST, SSEARCH and FASTA achieve high sensitivity by using scoring matrices (e.g. BLOSUM62) that target low identity (&amp;lt;33%) alignments. Although such scoring matrices can effectively identify distant homologs, they can also produce local alignments that extend beyond the homologous regions.<\/jats:p>\n               <jats:p>Results: We measured local alignment start\/stop boundary accuracy using a set of queries where the correct alignment boundaries were known, and found that 7% of BLASTP and 8% of SSEARCH alignment boundaries were overextended. Overextended alignments include non-homologous sequences; they occur most frequently between sequences that are more closely related (&amp;gt;33% identity). Adjusting the scoring matrix to reflect the identity of the homologous sequence can correct higher identity overextended alignment boundaries. In addition, the scoring matrix that produced a correct alignment could be reliably predicted based on the sequence identity seen in the original BLOSUM62 alignment. Realigning with the predicted scoring matrix corrected 37% of all overextended alignments, resulting in more correct alignments than using BLOSUM62 alone.<\/jats:p>\n               <jats:p>Availability: RefProtDom2 (RPD2) sequences and the FASTA software are available from http:\/\/faculty.virginia.edu\/wrpearson\/fasta.<\/jats:p>\n               <jats:p>Contact: \u00a0wrp@virginia.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt517","type":"journal-article","created":{"date-parts":[[2013,9,1]],"date-time":"2013-09-01T00:34:29Z","timestamp":1377995669000},"page":"3007-3013","source":"Crossref","is-referenced-by-count":10,"title":["Adjusting scoring matrices to correct overextended alignments"],"prefix":"10.1093","volume":"29","author":[{"given":"Lauren J.","family":"Mills","sequence":"first","affiliation":[{"name":"1 Department of Molecular, Cell and Developmental Biology and 2Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA"}]},{"given":"William R.","family":"Pearson","sequence":"additional","affiliation":[{"name":"1 Department of Molecular, Cell and Developmental Biology and 2Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA"}]}],"member":"286","published-online":{"date-parts":[[2013,8,31]]},"reference":[{"key":"2023012810490724400_btt517-B1","doi-asserted-by":"crossref","first-page":"555","DOI":"10.1016\/0022-2836(91)90193-A","article-title":"Amino acid substitution matrices from an information theoretic perspective","volume":"219","author":"Altschul","year":"1991","journal-title":"J. Mol. Biol."},{"key":"2023012810490724400_btt517-B2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023012810490724400_btt517-B3","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1093\/bioinformatics\/17.4.327","article-title":"A new approach to sequence comparison: normalized sequence alignment","volume":"17","author":"Arslan","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012810490724400_btt517-B4","doi-asserted-by":"crossref","first-page":"6073","DOI":"10.1073\/pnas.95.11.6073","article-title":"Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships","volume":"95","author":"Brenner","year":"1998","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012810490724400_btt517-B5","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1186\/1471-2105-10-421","article-title":"Blast+: architecture and applications","volume":"10","author":"Camacho","year":"2009","journal-title":"BMC Bioinform."},{"key":"2023012810490724400_btt517-B6","first-page":"387","article-title":"Locating well-conserved regions within a pairwise alignment","volume":"9","author":"Chao","year":"1993","journal-title":"Comput. Applic. Biosci."},{"key":"2023012810490724400_btt517-B7","first-page":"345","article-title":"A model of evolutionary change in proteins","volume-title":"Atlas of Protein Sequence and Structure","author":"Dayhoff","year":"1978"},{"key":"2023012810490724400_btt517-B8","doi-asserted-by":"crossref","first-page":"2177","DOI":"10.1093\/nar\/gkp1219","article-title":"Homologous over-extension: a challenge for iterative similarity searches","volume":"38","author":"Gonzalez","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"2023012810490724400_btt517-B9","doi-asserted-by":"crossref","first-page":"2361","DOI":"10.1093\/bioinformatics\/btq426","article-title":"RefProtDom: a protein database with improved domain boundaries and homology relationships","volume":"26","author":"Gonzalez","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012810490724400_btt517-B10","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino-acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012810490724400_btt517-B11","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1093\/oxfordjournals.molbev.a003985","article-title":"Estimating amino acid substitution models: A comparison of dayhoff\u2019s estimator, the resolvent approach and a maximum likelihood method","volume":"19","author":"Muller","year":"2002","journal-title":"Mol. Biol. Evol."},{"key":"2023012810490724400_btt517-B12","doi-asserted-by":"crossref","first-page":"1145","DOI":"10.1002\/pro.5560040613","article-title":"Comparison of methods for searching protein sequence databases","volume":"4","author":"Pearson","year":"1995","journal-title":"Protein Sci."},{"key":"2023012810490724400_btt517-B13","first-page":"185","article-title":"Flexible sequence similarity searching with the FASTA3 program package","volume":"132","author":"Pearson","year":"2000","journal-title":"Methods Mol. Biol."},{"key":"2023012810490724400_btt517-B14","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1016\/j.sbi.2005.05.005","article-title":"The limits of protein sequence comparison?","volume":"15","author":"Pearson","year":"2005","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023012810490724400_btt517-B15","doi-asserted-by":"crossref","first-page":"D290","DOI":"10.1093\/nar\/gkr1065","article-title":"The pfam protein families database","volume":"40","author":"Punta","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023012810490724400_btt517-B16","doi-asserted-by":"crossref","first-page":"1500","DOI":"10.1093\/bioinformatics\/18.11.1500","article-title":"Empirical determination of effective gap penalties for sequence comparison","volume":"18","author":"Reese","year":"2002","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/23\/3007\/48897472\/bioinformatics_29_23_3007.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/23\/3007\/48897472\/bioinformatics_29_23_3007.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,28]],"date-time":"2023-01-28T12:50:15Z","timestamp":1674910215000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/29\/23\/3007\/246397"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,8,31]]},"references-count":16,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2013,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt517","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2013,12,1]]},"published":{"date-parts":[[2013,8,31]]}}}