{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:40:14Z","timestamp":1760244014297,"version":"build-2065373602"},"reference-count":58,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2009,9,10]],"date-time":"2009-09-10T00:00:00Z","timestamp":1252540800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>A compressed full-text self-index for a text T is a data structure requiring reduced space and able to search for patterns P in T. It can also reproduce any substring of T, thus actually replacing T. Despite the recent explosion of interest on compressed indexes, there has not been much progress on functionalities beyond the basic exact search. In this paper we focus on indexed approximate string matching (ASM), which is of great interest, say, in bioinformatics. We study ASM algorithms for Lempel-Ziv compressed indexes and for compressed suffix trees\/arrays. Most compressed self-indexes belong to one of these classes. We start by adapting the classical method of partitioning into exact search to self-indexes, and optimize it over a representative of either class of self-index. Then, we show that a Lempel- Ziv index can be seen as an extension of the classical q-samples index. We give new insights on this type of index, which can be of independent interest, and then apply them to a Lempel- Ziv index. Finally, we improve hierarchical verification, a successful technique for sequential searching, so as to extend the matches of pattern pieces to the left or right. Most compressed suffix trees\/arrays support the required bidirectionality, thus enabling the implementation of the improved technique. In turn, the improved verification largely reduces the accesses to the text, which are expensive in self-indexes. We show experimentally that our algorithms are competitive and provide useful space-time tradeoffs compared to classical indexes.<\/jats:p>","DOI":"10.3390\/a2031105","type":"journal-article","created":{"date-parts":[[2009,9,11]],"date-time":"2009-09-11T03:22:26Z","timestamp":1252639346000},"page":"1105-1136","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":23,"title":["Approximate String Matching with Compressed Indexes"],"prefix":"10.3390","volume":"2","author":[{"given":"Lu\u00eds M.  S.","family":"Russo","sequence":"first","affiliation":[{"name":"INESC-ID, R. Alves Redol 9, 1000 Lisboa, Portugal"},{"name":"CITI, Departamento de Inform\u00b4atica, Faculdade de Ci\u02c6encias e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal"}]},{"given":"Gonzalo","family":"Navarro","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Chile, Avenida Blanco Encalada, 2120, 837-0459 Santiago, Chile Santiago, Chile"}]},{"given":"Arlindo  L.","family":"Oliveira","sequence":"additional","affiliation":[{"name":"INESC-ID, R. Alves Redol 9, 1000 Lisboa, Portugal"},{"name":"Instituto Superior T\u00b4ecnico, Universidade T\u00b4ecnica de Lisboa, Av. Rovisco Pais, 1, 1049-001 Lisboa, Portugal"}]},{"given":"Pedro","family":"Morales","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Chile, Avenida Blanco Encalada, 2120, 837-0459 Santiago, Chile Santiago, Chile"}]}],"member":"1968","published-online":{"date-parts":[[2009,9,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1145\/375360.375365","article-title":"A guided tour to approximate string matching","volume":"33","author":"Navarro","year":"2001","journal-title":"ACM Comput. Surv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Chang, W., and Marr, T. (1994, January June). Approximate string matching and local similarity. Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching (CPM), Asilomar, CA, USA.","DOI":"10.1007\/3-540-58094-8_23"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Fredriksson, K., and Navarro, G. (2004). Average-optimal single and multiple approximate string matching. ACM J. Exp. Algorithmics, 9, No. 1.4.","DOI":"10.1145\/1005813.1041513"},{"key":"ref_4","first-page":"19","article-title":"Indexing methods for approximate string matching","volume":"24","author":"Navarro","year":"2001","journal-title":"IEEE Data Eng. Bull."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Sung, W.K. (2008). Indexed approximate string matching, Springer.","DOI":"10.1007\/978-0-387-30162-4_188"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Cole, R., Gottlieb, L.A., and Lewenstein, M. (2004, January June). Dictionary matching and indexing with errors and don\u2019t cares. Proceedings of the 36th ACM Symposium on Theory of Computing (STOC), Chicago, IL, USA.","DOI":"10.1145\/1007352.1007374"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Maa\u00df, M., and Nowak, J. (2005, January June). Text indexing with errors. Proceedings of the 16th Annual Symposium on Combinatorial Pattern Matching (CPM), Jeju Island, Korea.","DOI":"10.1007\/11496656_3"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Chan, H.L., Lam, T.W., Sung, W.K., Tam, S.L., and Wong, S.S. (2006, January July). A linear size index for approximate pattern matching. Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching (CPM), Barcelona, Spain.","DOI":"10.1007\/11780441_6"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Coelho, L., and Oliveira, A. (2006, January October). Dotted suffix trees: a structure for approximate text indexing. Proceedings of the 13th International Symposium on String Processing and Information Retrieval (SPIRE), Glasgow, UK.","DOI":"10.1007\/11880561_27"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Weiner, P. (1973, January October). Linear pattern matching algorithms. 14th IEEE Annual Symposium on Switching and Automata Theory, Iowa City, USA.","DOI":"10.1109\/SWAT.1973.13"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"935","DOI":"10.1137\/0222058","article-title":"Suffix arrays: a new method for on-line string searches","volume":"22","author":"Manber","year":"1993","journal-title":"SIAM J. Comput."},{"key":"ref_12","unstructured":"Gonnet, G. (1992). A tutorial introduction to Computational Biochemistry using Darwin, Informatik E.T.H.. Technical report."},{"key":"ref_13","unstructured":"Ukkonen, E. (1994, January June). Approximate string matching over suffix trees. Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching (CPM), Asilomar, CA, USA."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Cobbs, A. (1995, January July). Fast approximate matching using suffix trees. Proceedings of the 6th Annual Symposium on Combinatorial Pattern Matching (CPM), Espoo, Finland.","DOI":"10.1007\/3-540-60044-2_33"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Sutinen, E., and Tarhio, J. (1996, January June). Filtration with q-samples in approximate string matching. In Proceeding of the 7th Annual Symposium on Combinatorial Pattern Matching (CPM), Laguna Beach, CA, USA.","DOI":"10.1007\/3-540-61258-0_4"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Navarro, G., and Baeza-Yates, R. (1998). A practical q-gram index for text retrieval allowing errors. CLEI Electron. J., 1, No. 2.","DOI":"10.19153\/cleiej.1.2.3"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1007\/BF01185432","article-title":"A sublinear algorithm for approximate keyword searching","volume":"12","author":"Myers","year":"1994","journal-title":"Algorithmica"},{"key":"ref_18","first-page":"205","article-title":"A hybrid indexing method for approximate string matching","volume":"1","author":"Navarro","year":"2000","journal-title":"J. Discrete Algorithms"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1016\/j.jda.2004.08.003","article-title":"Indexing text with approximate q-grams","volume":"3","author":"Navarro","year":"2005","journal-title":"J. Discrete Algorithms"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1149","DOI":"10.1002\/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O","article-title":"Reducing the space requirement of suffix trees","volume":"29","author":"Kurtz","year":"1999","journal-title":"Softw. Pract. Exper."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Navarro, G., and M\u00e4kinen, V. (2007). Compressed full-text indexes. ACM Comput. Surv., 39, No. 2.","DOI":"10.1145\/1216370.1216372"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1145\/382780.382782","article-title":"An analysis of the Burrows-Wheeler transform","volume":"48","author":"Manzini","year":"2001","journal-title":"JACM"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"530","DOI":"10.1109\/TIT.1978.1055934","article-title":"Compression of individual sequences via variable length coding","volume":"24","author":"Ziv","year":"1978","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"552","DOI":"10.1145\/1082036.1082039","article-title":"Indexing compressed text","volume":"52","author":"Ferragina","year":"2005","journal-title":"JACM"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1016\/S1570-8667(03)00066-2","article-title":"Indexing text using the Ziv-Lempel trie","volume":"2","author":"Navarro","year":"2004","journal-title":"J. Discrete Algorithms"},{"key":"ref_26","unstructured":"K\u00e4rkk\u00e4inen, J., and Ukkonen, E. (1996, January August). Lempel-Ziv parsing and sublinear-size index structures for string matching. Proceedings of the 3rd South American Workshop on String Processing (WSP), Recife, Brazil."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Arroyuelo, D., Navarro, G., and Sadakane, K. (2006, January July). Reducing the space requirement of LZ-Index. Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching (CPM), Barcelona, Spain.","DOI":"10.1007\/11780441_29"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1007\/s10791-008-9050-3","article-title":"A compressed self-index using a Ziv-Lempel dictionary","volume":"11","author":"Russo","year":"2008","journal-title":"Inf. Retr."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1016\/S0196-6774(03)00087-7","article-title":"New text indexing functionalities of the compressed suffix arrays","volume":"48","author":"Sadakane","year":"2003","journal-title":"J. Algorithms"},{"key":"ref_30","unstructured":"Grossi, R., Gupta, A., and Vitter, J. (2003, January January). High-order entropy-compressed text indexes. Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Baltimore, MD, USA."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ferragina, P., Manzini, G., M\u00e4kinen, V., and Navarro, G. (2007). Compressed representations of sequences and full-text indexes. ACM Trans. Algorithms, 3, No. 20.","DOI":"10.1145\/1240233.1240243"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"378","DOI":"10.1137\/S0097539702402354","article-title":"Compressed suffix arrays and suffix trees with applications to text indexing and string matching","volume":"35","author":"Grossi","year":"2005","journal-title":"SIAM J. Comput."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1007\/s00224-006-1198-x","article-title":"Compressed suffix trees with full functionality","volume":"41","author":"Sadakane","year":"2007","journal-title":"Theory Comput. Syst."},{"key":"ref_34","unstructured":"Fischer, J., M\u00e4kinen, V., and Navarro, G. (2008, January June). An(other) entropy-bounded compressed suffix tree. Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching (CPM), Pisa, Italy."},{"key":"ref_35","unstructured":"Russo, L., Navarro, G., and Oliveira, A. (2008, January April). Fully-compressed suffix trees. Proceedings of the 8th Latin American Symposium on Theoretical Informatics (LATIN), B\u00fazios, Brazil."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Ferragina, P., Gonz\u00e1lez, R., Navarro, G., and Venturini, R. (2009). Compressed text indexes: From theory to practice. ACM J. Exp. Algorithmics (JEA), 13, No. 12.","DOI":"10.1145\/1412228.1455268"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Huynh, T., Hon, W.K., Lam, T.W., and Sung, W.K. (2005, January June). Approximate string matching using compressed suffix arrays. Proceedings of the 16th Annual Symposium on Combinatorial Pattern Matching (CPM), Jeju Island, Korea.","DOI":"10.1007\/978-3-540-27801-6_33"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Lam, T.W., Sung, W.K., and Wong, S.S. (2005, January December). Improved approximate string matching using compressed suffix data structures. Proceedings of the 16th Annual International Symposium on Algorithms and Computation (ISAAC), Hainan, China.","DOI":"10.1007\/11602613_35"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1007\/s00453-001-0034-6","article-title":"Improving an algorithm for approximate pattern matching","volume":"30","author":"Navarro","year":"2001","journal-title":"Algorithmica"},{"key":"ref_40","unstructured":"Russo, L.M.S., Navarro, G., and Oliveira, A.L. (2007, January October). Approximate string matching with Lempel-Ziv compressed indexes. Proceedings of the 14th International Symposium on String Processing and Information Retrieval (SPIRE), Santiago, Chile."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Russo, L.M.S., Navarro, G., and Oliveira, A.L. (2008, January November). Indexed hierarchical approximate string matching. Proceedings of the 15th International Symposium on String Processing and Information Retrieval (SPIRE), Melbourne, Australia.","DOI":"10.1007\/978-3-540-89097-3_15"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Apostolico, A. (1985). Combinatorial Algorithms on Words, Springer-Verlag.","DOI":"10.1007\/978-3-642-82456-2"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Gusfield, D. (1997). Algorithms on Strings, Trees and Sequences, Cambridge University Press.","DOI":"10.1017\/CBO9780511574931"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1109\/TIT.1976.1055501","article-title":"On the complexity of finite sequences","volume":"22","author":"Lempel","year":"1976","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1109\/TIT.1977.1055714","article-title":"A universal algorithm for sequential data compression","volume":"23","author":"Ziv","year":"1977","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1109\/MC.1984.1659158","article-title":"A technique for high performance data compression","volume":"17","author":"Welch","year":"1984","journal-title":"IEEE Comput. Mag."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1016\/0196-6774(85)90023-9","article-title":"Finding approximate patterns in strings","volume":"6","author":"Ukkonen","year":"1985","journal-title":"J. Algorithms"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1137\/S0097539794264810","article-title":"Incremental string comparison","volume":"27","author":"Landau","year":"1998","journal-title":"SIAM J. Comput."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1007\/s00453-003-1029-2","article-title":"Linear bidirectional on-line construction of affix trees","volume":"37","year":"2003","journal-title":"Algorithmica"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Navarro, G. (2009). Implementing the lz-index: Theory versus practice. ACM J. Exp. Algorithmics, 13, No. 2.","DOI":"10.1145\/1412228.1412230"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1016\/0196-6774(85)90023-9","article-title":"Finding approximate patterns in strings","volume":"6","author":"Ukkonen","year":"1985","journal-title":"J. Algorithms"},{"key":"ref_52","first-page":"205","article-title":"A hybrid indexing method for approximate string matching","volume":"1","author":"Navarro","year":"2000","journal-title":"J. Discrete Algorithms"},{"key":"ref_53","unstructured":"Lee, S., and Park, K. (2008, January June). Dynamic rank-select structures with applications to run-length encoded texts. Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching (CPM), Pisa, Italy."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1145\/316542.316550","article-title":"A fast bit-vector algorithm for approximate string matching based on dynamic programming","volume":"46","author":"Myers","year":"1999","journal-title":"JACM"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1016\/S0020-0190(99)00121-0","article-title":"Very fast and simple approximate string matching","volume":"72","author":"Navarro","year":"1999","journal-title":"Inf. Proc. Lett."},{"key":"ref_56","unstructured":"Raman, R., Raman, V., and Rao, S.S. (2002, January January). Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. Proceedings of the 13th annual ACM-SIAM symposium on Discrete algorithms, San Francisco, CA, USA."},{"key":"ref_57","first-page":"32:1","article-title":"Dynamic entropy-compressed sequences and full-text indexes","volume":"4","author":"Navarro","year":"2008","journal-title":"ACM Trans. Algorithms"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1145\/135239.135244","article-title":"Fast text searching allowing errors","volume":"35","author":"Wu","year":"1992","journal-title":"Commun. ACM"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/2\/3\/1105\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T22:11:09Z","timestamp":1760220669000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/2\/3\/1105"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,9,10]]},"references-count":58,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2009,9]]}},"alternative-id":["a2031105"],"URL":"https:\/\/doi.org\/10.3390\/a2031105","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2009,9,10]]}}}