{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,17]],"date-time":"2025-12-17T08:37:13Z","timestamp":1765960633456,"version":"3.41.0"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","license":[{"start":{"date-parts":[[2009,12,1]],"date-time":"2009-12-01T00:00:00Z","timestamp":1259625600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100002341","name":"Suomen Akatemia","doi-asserted-by":"publisher","award":["119815"],"award-info":[{"award-number":["119815"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["ACM J. Exp. Algorithmics"],"published-print":{"date-parts":[[2009,12]]},"abstract":"<jats:p>\n            Suffix tree is one of the most important data structures in string algorithms and biological sequence analysis. Unfortunately, when it comes to implementing those algorithms and applying them to real genomic sequences, often the main memory size becomes the bottleneck. This is easily explained by the fact that while a DNA sequence of length\n            <jats:italic>n<\/jats:italic>\n            from alphabet \u03a3 = {\n            <jats:italic>A<\/jats:italic>\n            ,\n            <jats:italic>C<\/jats:italic>\n            ,\n            <jats:italic>G<\/jats:italic>\n            ,\n            <jats:italic>T<\/jats:italic>\n            } can be stored in\n            <jats:italic>n<\/jats:italic>\n            log |\u03a3| = 2\n            <jats:italic>n<\/jats:italic>\n            bits, its suffix tree occupies\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>n<\/jats:italic>\n            log\n            <jats:italic>n<\/jats:italic>\n            ) bits. In practice, the size difference easily reaches factor 50.\n          <\/jats:p>\n          <jats:p>\n            We report on an implementation of the compressed suffix tree very recently proposed by Sadakane (2007). The compressed suffix tree occupies space proportional to the text size, that is,\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>n<\/jats:italic>\n            log |\u03a3|) bits, and supports all typical suffix tree operations with at most log\n            <jats:italic>n<\/jats:italic>\n            factor slowdown. Our experiments show that, for example, on a 10 MB DNA sequence, the compressed suffix tree takes 10% of the space of the normal suffix tree. At the same time, a representative algorithm is slowed down by factor 30.\n          <\/jats:p>\n          <jats:p>\n            Our implementation follows the original proposal in spirit, but some internal parts are tailored toward practical implementation. Our construction algorithm has time requirement\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>n<\/jats:italic>\n            log\n            <jats:italic>n<\/jats:italic>\n            log |\u03a3|) and uses closely the same space as the final structure while constructing it: on the 10MB DNA sequence, the maximum space usage during construction is only 1.5 times the final product size. As by-products, we develop a method to create\n            <jats:italic>Succinct Suffix Array<\/jats:italic>\n            directly from Burrows-Wheeler transform and a space-efficient version of the suffixes-insertion algorithm to build balanced parentheses representation of suffix tree from LCP information.\n          <\/jats:p>","DOI":"10.1145\/1498698.1594228","type":"journal-article","created":{"date-parts":[[2010,4,7]],"date-time":"2010-04-07T02:56:32Z","timestamp":1270608992000},"source":"Crossref","is-referenced-by-count":13,"title":["Engineering a compressed suffix tree implementation"],"prefix":"10.1145","volume":"14","author":[{"given":"N.","family":"V\u00e4lim\u00e4ki","sequence":"first","affiliation":[{"name":"University of Helsinki, Helsinki, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"V.","family":"M\u00e4kinen","sequence":"additional","affiliation":[{"name":"University of Helsinki, Helsinki, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"W.","family":"Gerlach","sequence":"additional","affiliation":[{"name":"Bielefeld University, AG Genominformatik, Bielefeld"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"K.","family":"Dixit","sequence":"additional","affiliation":[{"name":"IIT Kanpur, New Delhi, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2010,1,5]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1570-8667(03)00065-0"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0022-2836(05)80360-2"},{"key":"e_1_2_1_3_1","series-title":"NATO ISI Series","volume-title":"Combinatorial Algorithms on Words","author":"Apostolico A.","unstructured":"Apostolico , A. 1985. The myriad virtues of subword trees . In Combinatorial Algorithms on Words . NATO ISI Series . Springer-Verlag , Berlin , 85--96. Apostolico, A. 1985. The myriad virtues of subword trees. In Combinatorial Algorithms on Words. NATO ISI Series. Springer-Verlag, Berlin, 85--96."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/646388.690192"},{"key":"e_1_2_1_5_1","unstructured":"Burrows M. and Wheeler D. 1994. A block sorting lossless data compression algorithm. Tech. rep. 124 Digital Equipment Corporation.  Burrows M. and Wheeler D. 1994. A block sorting lossless data compression algorithm. Tech. rep. 124 Digital Equipment Corporation."},{"volume-title":"Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM'04)","author":"Chan H.-L.","key":"e_1_2_1_6_1","unstructured":"Chan , H.-L. , Hon , W.-K. , and Lam , T . -W. 2004. Compressed index for a dynamic collection of texts . In Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM'04) . Springer-Verlag, Berlin, 445--456. Chan, H.-L., Hon, W.-K., and Lam, T.-W. 2004. Compressed index for a dynamic collection of texts. In Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM'04). Springer-Verlag, Berlin, 445--456."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2005.3"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007352.1007374"},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Crochemore M. and Rytter W. 2002. Jewels of Stringology. World Scientific Hackensack NJ.  Crochemore M. and Rytter W. 2002. Jewels of Stringology. World Scientific Hackensack NJ.","DOI":"10.1142\/4838"},{"volume-title":"Universal codeword sets and representation of the integers","author":"Elias P.","key":"e_1_2_1_11_1","unstructured":"Elias , P. 1 975. Universal codeword sets and representation of the integers . IEEE Trans. Inf. Theor . 21, 2, 194--200. Elias, P. 1 975. Universal codeword sets and representation of the integers. IEEE Trans. Inf. Theor. 21, 2, 194--200."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1082036.1082039"},{"volume-title":"Proceedings of the 4th Workshop on Efficient and Experimental Algorithms (WEA'05)","author":"Gonz\u00e1lez R.","key":"e_1_2_1_13_1","unstructured":"Gonz\u00e1lez , R. , Grabowski , S. , M\u00e4kinen , V. , and Navarro , G . 2005. Practical implementation of rank and select queries . In Proceedings of the 4th Workshop on Efficient and Experimental Algorithms (WEA'05) . Springer-Verlag, Berlin, 27--38. Gonz\u00e1lez, R., Grabowski, S., M\u00e4kinen, V., and Navarro, G. 2005. Practical implementation of rank and select queries. In Proceedings of the 4th Workshop on Efficient and Experimental Algorithms (WEA'05). Springer-Verlag, Berlin, 27--38."},{"volume-title":"Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'03)","author":"Grossi R.","key":"e_1_2_1_14_1","unstructured":"Grossi , R. , Gupta , A. , and Vitter , J . 2003. High-order entropy-compressed text indexes . In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'03) . ACM, New York, 841--850. Grossi, R., Gupta, A., and Vitter, J. 2003. High-order entropy-compressed text indexes. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'03). ACM, New York, 841--850."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539702402354"},{"volume-title":"Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology","author":"Gusfield D.","key":"e_1_2_1_16_1","unstructured":"Gusfield , D. 1997. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology . Cambridge University Press , Cambridge, UK . Gusfield, D. 1997. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge, UK."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-24587-2_26"},{"volume-title":"Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching (CPM'02)","author":"Hon W.-K.","key":"e_1_2_1_19_1","unstructured":"Hon , W.-K. and Sadakane , K . 2002. Space-economical algorithms for finding maximal unique matches . In Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching (CPM'02) . Springer-Verlag, Berlin, 144--152. Hon, W.-K. and Sadakane, K. 2002. Space-economical algorithms for finding maximal unique matches. In Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching (CPM'02). Springer-Verlag, Berlin, 144--152."},{"volume-title":"Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS'03)","author":"Hon W.-K.","key":"e_1_2_1_20_1","unstructured":"Hon , W.-K. , Sadakane , K. , and Sung , W . -K. 2003. Breaking a time-and-space barrier in constructing full-text indices . In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS'03) . IEEE, Los Alamitos, CA, 251--260. Hon, W.-K., Sadakane, K., and Sung, W.-K. 2003. Breaking a time-and-space barrier in constructing full-text indices. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS'03). IEEE, Los Alamitos, CA, 251--260."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/SFCS.1989.63533"},{"key":"e_1_2_1_22_1","unstructured":"K\u00e4rkk\u00e4inen J. 2006. personal communication.  K\u00e4rkk\u00e4inen J. 2006. personal communication."},{"volume-title":"Proceedings of the 12th Annual Symposium Combinatorial Pattern Matching (CPM'01)","author":"Kasai T.","key":"e_1_2_1_23_1","unstructured":"Kasai , T. , Lee , G. , Arimura , H. , Arikawa , S. , and Park , K . 2001. Linear-time longest-common prefix computation in suffix arrays and its applications . In Proceedings of the 12th Annual Symposium Combinatorial Pattern Matching (CPM'01) . Springer-Verlag, Berlin, 181--192. Kasai, T., Lee, G., Arimura, H., Arikawa, S., and Park, K. 2001. Linear-time longest-common prefix computation in suffix arrays and its applications. In Proceedings of the 12th Annual Symposium Combinatorial Pattern Matching (CPM'01). Springer-Verlag, Berlin, 181--192."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/11780441_8"},{"key":"e_1_2_1_25_1","first-page":"40","article-title":"Succinct suffix arrays based on run-length encoding","volume":"12","author":"M\u00e4kinen V.","year":"2005","unstructured":"M\u00e4kinen , V. and Navarro , G. 2005 . Succinct suffix arrays based on run-length encoding . Nordic J. Comput. 12 , 1, 40 -- 66 . M\u00e4kinen, V. and Navarro, G. 2005. Succinct suffix arrays based on run-length encoding. Nordic J. Comput. 12, 1, 40--66.","journal-title":"Nordic J. Comput."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2007.07.013"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1367064.1367072"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1137\/0222058"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/382780.382782"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/321941.321946"},{"key":"e_1_2_1_31_1","volume-title":"Tables. In Proceedings of the 16th Foundations of Software Technology and Theoretical Computer Science (FSTTCS'96)","author":"Munro I.","year":"1996","unstructured":"Munro , I. 1996 . Tables. In Proceedings of the 16th Foundations of Software Technology and Theoretical Computer Science (FSTTCS'96) . Springer-Verlag, Berlin, 37--42. Munro, I. 1996. Tables. In Proceedings of the 16th Foundations of Software Technology and Theoretical Computer Science (FSTTCS'96). Springer-Verlag, Berlin, 37--42."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1006\/jagm.2000.1151"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1570-8667(03)00066-2"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1216370.1216372"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00224-006-1198-x"},{"volume-title":"Proceedings of the 7th Workshop on Algorithm Engineering and Experiments and the 2nd Workshop on Analytic Algorithmics and Combinatorics (ALENEX\/ANALCO'05)","author":"Sch\u00fcrmann K.-B.","key":"e_1_2_1_36_1","unstructured":"Sch\u00fcrmann , K.-B. and Stoye , J . 2005. An in-complex algorithm for fast suffix array construction . In Proceedings of the 7th Workshop on Algorithm Engineering and Experiments and the 2nd Workshop on Analytic Algorithmics and Combinatorics (ALENEX\/ANALCO'05) . SIAM, Philadelphia, 77--85. Sch\u00fcrmann, K.-B. and Stoye, J. 2005. An in-complex algorithm for fast suffix array construction. In Proceedings of the 7th Workshop on Algorithm Engineering and Experiments and the 2nd Workshop on Analytic Algorithmics and Combinatorics (ALENEX\/ANALCO'05). SIAM, Philadelphia, 77--85."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01206331"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/SWAT.1973.13"}],"container-title":["ACM Journal of Experimental Algorithmics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1498698.1594228","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1498698.1594228","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:38:38Z","timestamp":1750253918000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1498698.1594228"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,12]]},"references-count":36,"alternative-id":["10.1145\/1498698.1594228"],"URL":"https:\/\/doi.org\/10.1145\/1498698.1594228","relation":{},"ISSN":["1084-6654","1084-6654"],"issn-type":[{"type":"print","value":"1084-6654"},{"type":"electronic","value":"1084-6654"}],"subject":[],"published":{"date-parts":[[2009,12]]}}}