{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T07:04:24Z","timestamp":1776495864486,"version":"3.51.2"},"reference-count":24,"publisher":"Oxford University Press (OUP)","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,3,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments.<\/jats:p>\n               <jats:p>Results: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(|T|0.628|P|) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments).<\/jats:p>\n               <jats:p>Availability: \u00a0www.cs.hku.hk\/~ckwong3\/bwtsw<\/jats:p>\n               <jats:p>Contact: \u00a0twlam@cs.hku.hk<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn032","type":"journal-article","created":{"date-parts":[[2008,1,29]],"date-time":"2008-01-29T01:34:06Z","timestamp":1201570446000},"page":"791-797","source":"Crossref","is-referenced-by-count":98,"title":["Compressed indexing and local alignment of DNA"],"prefix":"10.1093","volume":"24","author":[{"given":"T. W.","family":"Lam","sequence":"first","affiliation":[{"name":"1 Department of Computer Science, University of Hong Kong, Hong Kong, China and 2Department of Computer Science, National University of Singapore, Singapore"}]},{"given":"W. K.","family":"Sung","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of Hong Kong, Hong Kong, China and 2Department of Computer Science, National University of Singapore, Singapore"}]},{"given":"S. L.","family":"Tam","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of Hong Kong, Hong Kong, China and 2Department of Computer Science, National University of Singapore, Singapore"}]},{"given":"C. K.","family":"Wong","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of Hong Kong, Hong Kong, China and 2Department of Computer Science, National University of Singapore, Singapore"}]},{"given":"S. M.","family":"Yiu","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of Hong Kong, Hong Kong, China and 2Department of Computer Science, National University of Singapore, Singapore"}]}],"member":"286","published-online":{"date-parts":[[2008,1,28]]},"reference":[{"key":"2023020209513941500_B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023020209513941500_B2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: A new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucl. Acids Res"},{"key":"2023020209513941500_B3","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1145\/299432.299460","article-title":"q-Gram based database searching using a suffix array (quasar)","author":"Burkhardt","year":"1999","journal-title":"RECOMB"},{"key":"2023020209513941500_B4","article-title":"A block-sorting lossless data compression algorithm","author":"Burrow","year":"1994","journal-title":"Technical Report 124, Digital Equipment Corporation"},{"key":"2023020209513941500_B5","first-page":"4","article-title":"Indexing DNA sequences using q-grams","author":"Cao","year":"2005","journal-title":"DASFAA"},{"key":"2023020209513941500_B6","first-page":"390","article-title":"Opportunistic data structures with applications","author":"Ferragina","year":"2000","journal-title":"FOCS"},{"key":"2023020209513941500_B7","first-page":"269","article-title":"An experimental study of an opportunistic index","author":"Ferragina","year":"2001","journal-title":"SODA"},{"key":"2023020209513941500_B8","doi-asserted-by":"crossref","first-page":"873","DOI":"10.1093\/bioinformatics\/18.6.873","article-title":"SST: An algorithm for finding near-exact sequence matches in time proportional to the logarithm of the database size","volume":"18","author":"Giladi","year":"2002","journal-title":"Bioinformatics"},{"key":"2023020209513941500_B9","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1145\/335305.335351","article-title":"Compressed suffix arrays and suffix trees with applications to text indexing and string matching","author":"Grossi","year":"2000","journal-title":"STOC"},{"key":"2023020209513941500_B10","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511574931","article-title":"Algorithms on Strings, Trees, and Sequences","author":"Gusfield","year":"1997"},{"key":"2023020209513941500_B11","doi-asserted-by":"crossref","first-page":"2306","DOI":"10.1101\/gr.1350803","article-title":"Annotating large genomes with exact word matches","volume":"13","author":"Healy","year":"2003","journal-title":"Genomes Research"},{"key":"2023020209513941500_B12","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/s00453-006-1228-8","article-title":"Constructing compressed suffix arrays with large alphabets","volume":"48","author":"Hon","year":"2007","journal-title":"Algorithmica"},{"key":"2023020209513941500_B13","first-page":"31","article-title":"Practical aspects of compressed suffix arrays and FM-Index in searching DNA sequences","author":"Hon","year":"2004","journal-title":"ALENEX\/ANALC"},{"key":"2023020209513941500_B14","doi-asserted-by":"crossref","first-page":"256","DOI":"10.1007\/s007780200064","article-title":"Database indexing for large DNA and protein sequence collections","volume":"11","author":"Hunt","year":"2002","journal-title":"The VLDB J"},{"key":"2023020209513941500_B15","first-page":"2264","article-title":"Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes","author":"Karlin","year":"1990"},{"key":"2023020209513941500_B16","doi-asserted-by":"crossref","first-page":"1149","DOI":"10.1002\/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O","article-title":"Reducing the space requirement of suffix trees","volume":"29","author":"Kurtz","year":"1999","journal-title":"Software - Practice and Exp"},{"key":"2023020209513941500_B17","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1142\/S0219720004000661","article-title":"PatterHunter II: Highly sensitive and fast homology search","volume":"2","author":"Li","year":"2004","journal-title":"J. Bioinformatics Comput. Biol"},{"key":"2023020209513941500_B18","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1089\/cmb.2005.12.407","article-title":"Space-efficient whole genome comparisons with Burrows-Wheeler transforms","volume":"12","author":"Lippert","year":"2005","journal-title":"J. Comput. Biol"},{"key":"2023020209513941500_B19","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1145\/321941.321946","article-title":"A space-economical suffix tree construction algorithm","volume":"23","author":"McCreight","year":"1976","journal-title":"J. ACM"},{"key":"2023020209513941500_B20","first-page":"910","article-title":"OASIS: An online and accurate technique for local-alignment searches on biological sequences","author":"Meek","year":"2003","journal-title":"VLDB"},{"key":"2023020209513941500_B21","first-page":"359","article-title":"Effective indexing and filtering for similarity search in large biosequence databases","author":"Ozturk","year":"2003","journal-title":"BIBE"},{"key":"2023020209513941500_B22","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1016\/S0196-6774(03)00087-7","article-title":"New text indexing functionalities of the compressed suffix arrays","volume":"48","author":"Sadakane","year":"2003","journal-title":"J. Algorithms"},{"key":"2023020209513941500_B23","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol"},{"key":"2023020209513941500_B24","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1109\/69.979973","article-title":"Indexing and retrieval for genomic databases","volume":"14","author":"Williams","year":"2002","journal-title":"IEEE Trans. Knowledge Data Eng"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/6\/791\/49047892\/bioinformatics_24_6_791.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/6\/791\/49047892\/bioinformatics_24_6_791.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T10:47:23Z","timestamp":1675334843000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/6\/791\/193908"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,1,28]]},"references-count":24,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2008,3,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn032","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2008,3,15]]},"published":{"date-parts":[[2008,1,28]]}}}