{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,7]],"date-time":"2026-02-07T16:45:39Z","timestamp":1770482739519,"version":"3.49.0"},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"13","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: High-throughput sequencing technologies place ever increasing demands on existing algorithms for sequence analysis. Algorithms for computing maximal exact matches (MEMs) between sequences appear in two contexts where high-throughput sequencing will vastly increase the volume of sequence data: (i) seeding alignments of high-throughput reads for genome assembly and (ii) designating anchor points for genome\u2013genome comparisons.<\/jats:p>\n               <jats:p>Results: We introduce a new algorithm for finding MEMs. The algorithm leverages a sparse suffix array (SA), a text index that stores every K-th position of the text. In contrast to a full text index that stores every position of the text, a sparse SA occupies much less memory. Even though we use a sparse index, the output of our algorithm is the same as a full text index algorithm as long as the space between the indexed suffixes is not greater than a minimum length of a MEM. By relying on partial matches and additional text scanning between indexed positions, the algorithm trades memory for extra computation. The reduced memory usage makes it possible to determine MEMs between significantly longer sequences.<\/jats:p>\n               <jats:p>Availability: Source code for the algorithm is available under a BSD open source license at http:\/\/compbio.cs.princeton.edu\/mems. The implementation can serve as a drop-in replacement for the MEMs algorithm in MUMmer 3.<\/jats:p>\n               <jats:p>Contact: \u00a0zkhan@cs.princeton.edu;mona@cs.princeton.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp275","type":"journal-article","created":{"date-parts":[[2009,4,24]],"date-time":"2009-04-24T00:25:34Z","timestamp":1240532734000},"page":"1609-1616","source":"Crossref","is-referenced-by-count":49,"title":["A practical algorithm for finding maximal exact matches in large sequence datasets using sparse suffix arrays"],"prefix":"10.1093","volume":"25","author":[{"given":"Zia","family":"Khan","sequence":"first","affiliation":[{"name":"1 Department of Computer Science, 2Lewis-Sigler Institute for Integrative Genomics, 3Department of Molecular Biology, 4Department of Ecology and Evoluationary Biology and 5Howard Hughes Medical Institute, Princeton University, Princeton, New Jersey 08544, USA"},{"name":"1 Department of Computer Science, 2Lewis-Sigler Institute for Integrative Genomics, 3Department of Molecular Biology, 4Department of Ecology and Evoluationary Biology and 5Howard Hughes Medical Institute, Princeton University, Princeton, New Jersey 08544, USA"}]},{"given":"Joshua S.","family":"Bloom","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, 2Lewis-Sigler Institute for Integrative Genomics, 3Department of Molecular Biology, 4Department of Ecology and Evoluationary Biology and 5Howard Hughes Medical Institute, Princeton University, Princeton, New Jersey 08544, USA"},{"name":"1 Department of Computer Science, 2Lewis-Sigler Institute for Integrative Genomics, 3Department of Molecular Biology, 4Department of Ecology and Evoluationary Biology and 5Howard Hughes Medical Institute, Princeton University, Princeton, New Jersey 08544, USA"}]},{"given":"Leonid","family":"Kruglyak","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, 2Lewis-Sigler Institute for Integrative Genomics, 3Department of Molecular Biology, 4Department of Ecology and Evoluationary Biology and 5Howard Hughes Medical Institute, Princeton University, Princeton, New Jersey 08544, USA"},{"name":"1 Department of Computer Science, 2Lewis-Sigler Institute for Integrative Genomics, 3Department of Molecular Biology, 4Department of Ecology and Evoluationary Biology and 5Howard Hughes Medical Institute, Princeton University, Princeton, New Jersey 08544, USA"},{"name":"1 Department of Computer Science, 2Lewis-Sigler Institute for Integrative Genomics, 3Department of Molecular Biology, 4Department of Ecology and Evoluationary Biology and 5Howard Hughes Medical Institute, Princeton University, Princeton, New Jersey 08544, USA"}]},{"given":"Mona","family":"Singh","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, 2Lewis-Sigler Institute for Integrative Genomics, 3Department of Molecular Biology, 4Department of Ecology and Evoluationary Biology and 5Howard Hughes Medical Institute, Princeton University, Princeton, New Jersey 08544, USA"},{"name":"1 Department of Computer Science, 2Lewis-Sigler Institute for Integrative Genomics, 3Department of Molecular Biology, 4Department of Ecology and Evoluationary Biology and 5Howard Hughes Medical Institute, Princeton University, Princeton, New Jersey 08544, USA"}]}],"member":"286","published-online":{"date-parts":[[2009,4,23]]},"reference":[{"key":"2023013112133741400_B1","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/S1570-8667(03)00065-0","article-title":"Replacing suffix trees with enhanced suffix arrays","volume":"2","author":"Abouelhoda","year":"2004","journal-title":"J. Discrete Algorithms"},{"key":"2023013112133741400_B2","first-page":"7","article-title":"Enhanced suffix arrays and applications. Chapter 7","volume-title":"Handbook of Computational Molecular Biology.","author":"Abouelhoda","year":"2006"},{"key":"2023013112133741400_B3","doi-asserted-by":"crossref","first-page":"693","DOI":"10.1101\/gr.1960404","article-title":"MAVID: Constrained ancestral alignment of multiple sequences","volume":"14","author":"Bray","year":"2004","journal-title":"Genome Res."},{"key":"2023013112133741400_B4","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1016\/j.compbiolchem.2005.04.004","article-title":"GAME: a simple and efficient whole genome alignment method using maximal exact match filtering","volume":"29","author":"Choi","year":"2005","journal-title":"Comp. Biol. Chem."},{"key":"2023013112133741400_B5"},{"key":"2023013112133741400_B6","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1126\/science.1162986","article-title":"Real-time DNA sequencing from single polymerase molecules","volume":"323","author":"Eid","year":"2009","journal-title":"Science"},{"key":"2023013112133741400_B7","first-page":"328","article-title":"Suffix arrays on words","volume-title":"Proceedings of the 18th Annual Symposium on Combinatorial Pattern Matching (CPM'07). Vol. of Lecture Notes in Computer Science.","author":"Ferragina","year":"2007"},{"key":"2023013112133741400_B8","article-title":"Compressed text indexes: from theory to practice","volume":"13","author":"Ferragina","year":"2009","journal-title":"ACM J. Exp. Algorithmics (JEA)"},{"key":"2023013112133741400_B9","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511574931","volume-title":"Algorithms on Strings, Trees, and Sequences.","author":"Gusfield","year":"1997"},{"issue":"Suppl 1","key":"2023013112133741400_B10","doi-asserted-by":"crossref","first-page":"S312","DOI":"10.1093\/bioinformatics\/18.suppl_1.S312","article-title":"Efficient multiple genome alignment","volume":"18","author":"H\u00f6hl","year":"2002","journal-title":"Bioinformatics"},{"key":"2023013112133741400_B11","doi-asserted-by":"crossref","first-page":"1916","DOI":"10.1073\/pnas.0307971100","article-title":"Whole-genome shotgun assembly and comparison of human genome assemblies","volume":"101","author":"Istrail","year":"2004","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013112133741400_B12","first-page":"219","article-title":"Sparse suffix trees","volume-title":"COCOON 1996. vol. 1090 of Lecture Notes in Computer Science.","author":"K\u00e4rkk\u00e4inen","year":"1996"},{"key":"2023013112133741400_B13","first-page":"181","article-title":"Linear-time longest-common-prefix computation in suffix arrays and its applications","volume-title":"Proceedings of the 12th Symposium on Combinatorial Pattern Matching (CPM '01). Vol. 2089 of Lecture Notes in Computer Science.","author":"Kasai","year":"2001"},{"key":"2023013112133741400_B14","doi-asserted-by":"crossref","first-page":"1149","DOI":"10.1002\/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O","article-title":"Reducing the space requirement of suffix trees","volume":"29","author":"Kurtz","year":"1999","journal-title":"Soft. Pract. Exp."},{"key":"2023013112133741400_B15","doi-asserted-by":"crossref","first-page":"R12","DOI":"10.1186\/gb-2004-5-2-r12","article-title":"Versatile and open software for comparing large genomes","volume":"5","author":"Kurtz","year":"2004","journal-title":"Genome Biol."},{"key":"2023013112133741400_B16","doi-asserted-by":"crossref","first-page":"258","DOI":"10.1016\/j.tcs.2007.07.017","article-title":"Faster suffix sorting","volume":"387","author":"Larsson","year":"2007","journal-title":"Theor. Comp. Sci."},{"key":"2023013112133741400_B17","doi-asserted-by":"crossref","first-page":"935","DOI":"10.1137\/0222058","article-title":"Suffix arrays: a new method for on-line string searches","volume":"22","author":"Manber","year":"1993","journal-title":"SIAM J. Comput."},{"key":"2023013112133741400_B18","first-page":"372","article-title":"Two space saving tricks for linear time LCP array computation","volume-title":"SWAT 2004. Vol. 3111 of Lecture Notes in Computer Science.","author":"Manzini","year":"2004"},{"key":"2023013112133741400_B19","first-page":"5","article-title":"Engineering radix sort","volume":"6","author":"McIlroy","year":"1993","journal-title":"Comput. Syst."},{"key":"2023013112133741400_B20","doi-asserted-by":"crossref","first-page":"2196","DOI":"10.1126\/science.287.5461.2196","article-title":"A whole-genome assembly of Drosophila","volume":"287","author":"Myers","year":"2000","journal-title":"Science"},{"key":"2023013112133741400_B21","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1089\/cmb.2007.0105","article-title":"Space efficient computation of rare maximal exact matches between multiple sequences","volume":"15","author":"Ohlebusch","year":"2008","journal-title":"J. Comput. Biol."},{"key":"2023013112133741400_B22","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1126\/science.1161847","article-title":"A physical map of the 1-gigabase bread wheat chromosome 3B","volume":"322","author":"Paux","year":"2008","journal-title":"Science"},{"key":"2023013112133741400_B23","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1016\/j.tig.2007.12.006","article-title":"Bioinformatics challenges of new sequencing technology","volume":"24","author":"Pop","year":"2008","journal-title":"Trends Genet."},{"key":"2023013112133741400_B24","doi-asserted-by":"crossref","first-page":"474","DOI":"10.1186\/1471-2105-8-474","article-title":"High-throughput sequence alignment using graphics processing units","volume":"8","author":"Schatz","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023013112133741400_B25","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1101\/gr.809403","article-title":"Human-mouse alignments with BLASTZ","volume":"13","author":"Schwartz","year":"2003","journal-title":"Genome Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/13\/1609\/48996696\/bioinformatics_25_13_1609.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/13\/1609\/48996696\/bioinformatics_25_13_1609.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T21:42:42Z","timestamp":1675201362000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/13\/1609\/196165"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,4,23]]},"references-count":25,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2009,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp275","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,7,1]]},"published":{"date-parts":[[2009,4,23]]}}}