{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,10,10]],"date-time":"2023-10-10T12:16:48Z","timestamp":1696940208700},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,1,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Protein sequence alignment plays a critical role in computational biology as it is an integral part in many analysis tasks designed to solve problems in comparative genomics, structure and function prediction, and homology modeling.<\/jats:p><jats:p>Methods: We have developed novel sequence alignment algorithms that compute the alignment between a pair of sequences based on short fixed- or variable-length high-scoring subsequences. Our algorithms build the alignments by repeatedly selecting the highest scoring pairs of subsequences and using them to construct small portions of the final alignment. We utilize PSI-BLAST generated sequence profiles and employ a profile-to-profile scoring scheme derived from PICASSO.<\/jats:p><jats:p>Results: We evaluated the performance of the computed alignments on two recently published benchmark datasets and compared them against the alignments computed by existing state-of-the-art dynamic programming-based profile-to-profile local and global sequence alignment algorithms. Our results show that the new algorithms achieve alignments that are comparable with or better than those achieved by existing algorithms. Moreover, our results also showed that these algorithms can be used to provide better information as to which of the aligned positions are more reliable\u2014a critical piece of information for comparative modeling applications.<\/jats:p><jats:p>Contact: \u00a0rangwala@cs.umn.edu<\/jats:p><jats:p>Supplementary information: \u00a0<\/jats:p>","DOI":"10.1093\/bioinformatics\/btl297","type":"journal-article","created":{"date-parts":[[2007,1,19]],"date-time":"2007-01-19T18:51:12Z","timestamp":1169232672000},"page":"e17-e23","source":"Crossref","is-referenced-by-count":2,"title":["Incremental window-based protein sequence alignment algorithms"],"prefix":"10.1093","volume":"23","author":[{"given":"Huzefa","family":"Rangwala","sequence":"first","affiliation":[{"name":"Department of Computer Science & Engineering, University of Minnesota \u00a0 Minneapolis, MN 55455, USA"}]},{"given":"George","family":"Karypis","sequence":"additional","affiliation":[{"name":"Department of Computer Science & Engineering, University of Minnesota \u00a0 Minneapolis, MN 55455, USA"}]}],"member":"286","published-online":{"date-parts":[[2007,1,15]]},"reference":[{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol."},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped blast and psi-blast: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"352","DOI":"10.1110\/ps.40501","article-title":"Livebench: continuous benchmarking of protein structure prediction servers","volume":"10","author":"Bujnicki","year":"2001","journal-title":"Protein Sci."},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"306","DOI":"10.1093\/bioinformatics\/18.2.306","article-title":"Predicting reliable regions in protein sequence alignments","volume":"18","author":"Cline","year":"2002","journal-title":"Bioinformatics"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/1471-2105-2-5","article-title":"A study of quality measured for protein threading models","volume":"2","author":"Cristobal","year":"2001","journal-title":"BMC Bioinformatics"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"1309","DOI":"10.1093\/bioinformatics\/bth091","article-title":"Coach: profile-profile alignment of protein families using hidden Markov models","volume":"20","author":"Edgar","year":"2004","journal-title":"Bioinformatics"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"1301","DOI":"10.1093\/bioinformatics\/bth090","article-title":"A comparison of scoring functions for protein sequence profile alignment","volume":"20","author":"Edgar","year":"2004","journal-title":"Bioinformatics"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1002\/prot.10043","article-title":"A study on protein sequence alignment quality","volume":"46","author":"Elofsson","year":"2002","journal-title":"Proteins"},{"key":"2023041107141439000_","author":"Fawcett","year":"2004"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1002\/prot.10036","article-title":"Cafasp2: the second critical assessment of fully automated structure prediction methods","volume":"45","author":"Fischer","year":"2001","journal-title":"Proteins"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/S0097-8485(96)80004-0","article-title":"Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching","volume":"20","author":"Gribskov","year":"1996","journal-title":"Comput. Chem."},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"4355","DOI":"10.1073\/pnas.84.13.4355","article-title":"Profile analysis: detection of distantly related proteins","volume":"84","author":"Gribskov","year":"1987","journal-title":"Proc. Natl Acad. Sci USA"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511574931","volume-title":"Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology","author":"Gusfield","year":"1997"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"272","DOI":"10.1093\/bioinformatics\/17.3.272","article-title":"Picasso: generating a covering set of protein family profiles","volume":"17","author":"Heger","year":"2001","journal-title":"Bioinformatics"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1126\/science.273.5275.595","article-title":"Mapping the protein universe","volume":"273","author":"Holm","year":"1996","journal-title":"Science"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"1702","DOI":"10.1110\/ps.4820102","article-title":"In search for more accurate alignments in the twilight zone","volume":"11","author":"Jaroszewski","journal-title":"Protein Sci."},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1038\/358086a0","article-title":"A new approach to protein fold recognition","volume":"358","author":"Jones","year":"1992","journal-title":"Nature"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"523","DOI":"10.1006\/jmbi.2000.3741","article-title":"Enhanced genome annotation using structural profiles in the program 3d-PSSM","volume":"299","author":"Kelley","year":"2000","journal-title":"J. Mol. Biol."},{"key":"2023041107141439000_","first-page":"152","article-title":"Profile-based string kernels for remote homology detection and motif extraction","volume-title":"Comput. Syst. Bioinform.","author":"Kuang","year":"2004"},{"key":"2023041107141439000_","first-page":"564","article-title":"The spectrum kernel: a string kernel for SVM protein classification","author":"Leslie","year":"2002","journal-title":"Pac. Symp. Biocomput."},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"5913","DOI":"10.1073\/pnas.95.11.5913","article-title":"A unified statistical framework for sequence comparison and structure comparison","volume":"95","author":"Levitt","year":"1998","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"1071","DOI":"10.1110\/ps.03379804","article-title":"Alignment of protein sequences by their profiles","volume":"13","author":"Marti-Renom","year":"2004","journal-title":"Protein Sci."},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1093\/protein\/9.2.127","article-title":"Quantifying the local reliability of sequence alignment","volume":"9","author":"Mevissen","year":"1996","journal-title":"Protein Eng."},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"1531","DOI":"10.1093\/bioinformatics\/btg185","article-title":"Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments","volume":"19","author":"Mittelman","year":"2003","journal-title":"Bioinformatics"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol."},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"1319","DOI":"10.1006\/jmbi.2000.3541","article-title":"Combination of threading potentials and sequence profiles improves fold recognition","volume":"296","author":"Panchenko","year":"2000","journal-title":"J. Mol. Biol."},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"2444","DOI":"10.1073\/pnas.85.8.2444","article-title":"Improved tools for biological sequence comparison","volume":"85","author":"Pearson","year":"1988","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"4239","DOI":"10.1093\/bioinformatics\/bti687","article-title":"Profile based direct kernels for remote homology detection and fold recognition","volume":"21","author":"Rangwala","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1002\/(SICI)1097-0134(20000701)40:1<6::AID-PROT30>3.0.CO;2-7","article-title":"Large-scale comparison of protein sequence alignments with structural alignments","volume":"40","author":"Sauder","year":"2000","journal-title":"Proteins"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"847","DOI":"10.1093\/bioinformatics\/18.6.847","article-title":"A novel approach to local reliability of sequence alignments","volume":"18","author":"Schlosshauer","year":"2002","journal-title":"Bioinformatics"},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"739","DOI":"10.1093\/protein\/11.9.739","article-title":"Protein structure alignment by incremental combinatorial extension (CE) of the optimal path","volume":"11","author":"Shindyalov","year":"1998","journal-title":"Protein Eng."},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol."},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"705","DOI":"10.1016\/S0022-2836(03)00622-3","article-title":"Predicting reliable regions in protein alignments from sequence profiles","volume":"330","author":"Tress","year":"2003","journal-title":"J. Mol. Biol."},{"key":"2023041107141439000_","doi-asserted-by":"crossref","first-page":"1612","DOI":"10.1110\/ps.03601504","article-title":"Scoring profile-to-profile sequence alignments","volume":"13","author":"Wang","year":"2004","journal-title":"Protein Sci."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/2\/e17\/49820425\/bioinformatics_23_2_e17.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/2\/e17\/49820425\/bioinformatics_23_2_e17.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,10]],"date-time":"2023-05-10T13:04:02Z","timestamp":1683723842000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/2\/e17\/201850"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,1,15]]},"references-count":35,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2007,1,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btl297","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2007,1,15]]},"published":{"date-parts":[[2007,1,15]]}}}