{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:29:18Z","timestamp":1750220958567,"version":"3.41.0"},"reference-count":40,"publisher":"Association for Computing Machinery (ACM)","license":[{"start":{"date-parts":[[2019,2,14]],"date-time":"2019-02-14T00:00:00Z","timestamp":1550102400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["ACM J. Exp. Algorithmics"],"published-print":{"date-parts":[[2019,12,17]]},"abstract":"<jats:p>The suffix array, perhaps the most important data structure in modern string processing, needs to be augmented with the longest-common-prefix (LCP) array in many applications. Their construction is often a major bottleneck, especially when the data is too big for internal memory. We describe two new algorithms for computing the LCP array from the suffix array in external memory. Experiments demonstrate that the new algorithms are about a factor of two faster than the fastest previous algorithm. We then further engineer the two new algorithms and improve them in three ways. First, we speed up the algorithms by up to a factor of two through parallelism. Eight threads is sufficient for making the algorithms essentially I\/O bound. Second, we reduce the disk space usage of the algorithms making them in-place: the input (text and suffix array) is treated as read-only, and the working disk space never exceeds the size of the final output (the LCP array). Third, we add support for large alphabets. All previous implementations assume the byte alphabet.<\/jats:p>","DOI":"10.1145\/3297723","type":"journal-article","created":{"date-parts":[[2019,2,14]],"date-time":"2019-02-14T19:36:17Z","timestamp":1550172977000},"page":"1-27","source":"Crossref","is-referenced-by-count":2,"title":["Better External Memory LCP Array Construction"],"prefix":"10.1145","volume":"24","author":[{"given":"Juha","family":"K\u00e4rkk\u00e4inen","sequence":"first","affiliation":[{"name":"Helsinki Institute for Information Technology and University of Helsinki, Helsinki, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dominik","family":"Kempa","sequence":"additional","affiliation":[{"name":"Helsinki Institute for Information Technology and University of Helsinki, Helsinki, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,2,14]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1570-8667(03)00065-0"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33122-0_26"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jda.2012.07.007"},{"volume-title":"Proceedings of the 2013 Workshop on Algorithm Engineering and Experiments (ALENEX\u201913)","author":"Bingmann T.","key":"e_1_2_1_4_1","unstructured":"T. Bingmann , J. Fischer , and V. Osipov . 2013. Inducing suffix and LCP arrays in external memory . In Proceedings of the 2013 Workshop on Algorithm Engineering and Experiments (ALENEX\u201913) . 88--102. T. Bingmann, J. Fischer, and V. Osipov. 2013. Inducing suffix and LCP arrays in external memory. In Proceedings of the 2013 Workshop on Algorithm Engineering and Experiments (ALENEX\u201913). 88--102."},{"key":"e_1_2_1_5_1","volume-title":"Technical Report 124. Digital","author":"Burrows M.","year":"1994","unstructured":"M. Burrows and D. J. Wheeler . 1994 . A Block Sorting Lossless Data Compression Algorithm . Technical Report 124. Digital Equipment Corporation , Palo Alto, CA . M. Burrows and D. J. Wheeler. 1994. A Block Sorting Lossless Data Compression Algorithm. Technical Report 124. Digital Equipment Corporation, Palo Alto, CA."},{"key":"e_1_2_1_7_1","series-title":"Lecture Notes in Computer Science","volume-title":"Combinatorial Pattern Matching","author":"da Louza F. A.","unstructured":"F. A. da Louza , G. P. Telles , and C. D. de Aguiar Ciferri . 2013. External memory generalized suffix and LCP arrays construction . In Combinatorial Pattern Matching . Lecture Notes in Computer Science , Vol. 7922 . Springer , 201--210. F. A. da Louza, G. P. Telles, and C. D. de Aguiar Ciferri. 2013. External memory generalized suffix and LCP arrays construction. In Combinatorial Pattern Matching. Lecture Notes in Computer Science, Vol. 7922. Springer, 201--210."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1227161.1402296"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442536"},{"volume-title":"Proceedings of the 2011 Workshop on Algorithm Engineering and Experiments (ALENEX\u201911)","author":"Gog S.","key":"e_1_2_1_10_1","unstructured":"S. Gog and E. Ohlebusch . 2011. Fast and lightweight LCP-array construction algorithms . In Proceedings of the 2011 Workshop on Algorithm Engineering and Experiments (ALENEX\u201911) . 25--34. S. Gog and E. Ohlebusch. 2011. Fast and lightweight LCP-array construction algorithms. In Proceedings of the 2011 Workshop on Algorithm Engineering and Experiments (ALENEX\u201911). 25--34."},{"key":"e_1_2_1_11_1","unstructured":"G. H. Gonnet R. A. Baeza-Yates and T. Snider. 1992. New indices for text: Pat trees and pat arrays. In Information Retrieval: Data Structures 8 Algorithms W. B. Frakes and R. Baeza-Yates (Eds.). Prentice Hall 66--82.   G. H. Gonnet R. A. Baeza-Yates and T. Snider. 1992. New indices for text: Pat trees and pat arrays. In Information Retrieval: Data Structures 8 Algorithms W. B. Frakes and R. Baeza-Yates (Eds.). Prentice Hall 66--82."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 2nd International Conference on Algorithms for Big Data (ICABD\u201914)","volume":"1146","author":"K\u00e4rkk\u00e4inen J.","unstructured":"J. K\u00e4rkk\u00e4inen and D. Kempa . 2014. Engineering a lightweight external memory suffix array construction algorithm . In Proceedings of the 2nd International Conference on Algorithms for Big Data (ICABD\u201914) (CEUR Workshop Proceedings) , Vol. 1146 . 53--60. J. K\u00e4rkk\u00e4inen and D. Kempa. 2014. Engineering a lightweight external memory suffix array construction algorithm. In Proceedings of the 2nd International Conference on Algorithms for Big Data (ICABD\u201914) (CEUR Workshop Proceedings), Vol. 1146. 53--60."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-07959-2_35"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the 24th Annual European Symposium on Algorithms (ESA\u201916)","volume":"57","author":"K\u00e4rkk\u00e4inen J.","unstructured":"J. K\u00e4rkk\u00e4inen and D. Kempa . 2016. Faster external memory LCP array construction . In Proceedings of the 24th Annual European Symposium on Algorithms (ESA\u201916) , Vol. 57 . 61:1--61:16. J. K\u00e4rkk\u00e4inen and D. Kempa. 2016. Faster external memory LCP array construction. In Proceedings of the 24th Annual European Symposium on Algorithms (ESA\u201916), Vol. 57. 61:1--61:16."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2851491"},{"key":"e_1_2_1_16_1","volume-title":"String Processing and Information Retrieval. Lecture Notes in Computer Science","volume":"9954","author":"K\u00e4rkk\u00e4inen J.","unstructured":"J. K\u00e4rkk\u00e4inen and D. Kempa . 2016. LCP array construction using O(sort(n)) (or less) I\/Os . In String Processing and Information Retrieval. Lecture Notes in Computer Science , Vol. 9954 . Springer, 204--217. J. K\u00e4rkk\u00e4inen and D. Kempa. 2016. LCP array construction using O(sort(n)) (or less) I\/Os. In String Processing and Information Retrieval. Lecture Notes in Computer Science, Vol. 9954. Springer, 204--217."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 16th International Symposium on Experimental Algorithms (SEA\u201917)","volume":"75","author":"K\u00e4rkk\u00e4inen J.","unstructured":"J. K\u00e4rkk\u00e4inen and D. Kempa . 2017. Engineering external memory LCP array construction: Parallel, in-place and large alphabet . In Proceedings of the 16th International Symposium on Experimental Algorithms (SEA\u201917) , Vol. 75 . 17:1--17:14. J. K\u00e4rkk\u00e4inen and D. Kempa. 2017. Engineering external memory LCP array construction: Parallel, in-place and large alphabet. In Proceedings of the 16th International Symposium on Experimental Algorithms (SEA\u201917), Vol. 75. 17:1--17:14."},{"key":"e_1_2_1_18_1","volume-title":"Combinatorial Pattern Matching. Lecture Notes in Computer Science","volume":"9133","author":"K\u00e4rkk\u00e4inen J.","unstructured":"J. K\u00e4rkk\u00e4inen , D. Kempa , and M. Pia\u0327tkowski . 2015. Tighter bounds for the sum of irreducible LCP values . In Combinatorial Pattern Matching. Lecture Notes in Computer Science , Vol. 9133 . Springer, 316--328. J. K\u00e4rkk\u00e4inen, D. Kempa, and M. Pia\u0327tkowski. 2015. Tighter bounds for the sum of irreducible LCP values. In Combinatorial Pattern Matching. Lecture Notes in Computer Science, Vol. 9133. Springer, 316--328."},{"key":"e_1_2_1_19_1","volume-title":"Combinatorial Pattern Matching. Lecture Notes in Computer Science","volume":"9133","author":"K\u00e4rkk\u00e4inen J.","unstructured":"J. K\u00e4rkk\u00e4inen , D. Kempa , and S. J. Puglisi . 2015. Parallel external memory suffix sorting . In Combinatorial Pattern Matching. Lecture Notes in Computer Science , Vol. 9133 . Springer, 329--342. J. K\u00e4rkk\u00e4inen, D. Kempa, and S. J. Puglisi. 2015. Parallel external memory suffix sorting. In Combinatorial Pattern Matching. Lecture Notes in Computer Science, Vol. 9133. Springer, 329--342."},{"volume-title":"Proceedings of the 19th Workshop on Algorithm Engineering and Experiments (ALENEX\u201917)","author":"K\u00e4rkk\u00e4inen J.","key":"e_1_2_1_20_1","unstructured":"J. K\u00e4rkk\u00e4inen , D. Kempa , S. J. Puglisi , and B. Zhukova . 2017. Engineering external memory induced suffix sorting . In Proceedings of the 19th Workshop on Algorithm Engineering and Experiments (ALENEX\u201917) . 98--108. J. K\u00e4rkk\u00e4inen, D. Kempa, S. J. Puglisi, and B. Zhukova. 2017. Engineering external memory induced suffix sorting. In Proceedings of the 19th Workshop on Algorithm Engineering and Experiments (ALENEX\u201917). 98--108."},{"key":"e_1_2_1_21_1","volume-title":"Combinatorial Pattern Matching. Lecture Notes in Computer Science","volume":"5577","author":"K\u00e4rkk\u00e4inen J.","unstructured":"J. K\u00e4rkk\u00e4inen , G. Manzini , and S. J. Puglisi . 2009. Permuted longest-common-prefix array . In Combinatorial Pattern Matching. Lecture Notes in Computer Science , Vol. 5577 . Springer, 181--192. J. K\u00e4rkk\u00e4inen, G. Manzini, and S. J. Puglisi. 2009. Permuted longest-common-prefix array. In Combinatorial Pattern Matching. Lecture Notes in Computer Science, Vol. 5577. Springer, 181--192."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/1759210.1759301"},{"key":"e_1_2_1_23_1","volume-title":"Combinatorial Pattern Matching. Lecture Notes in Computer Science","volume":"2089","author":"Kasai T.","unstructured":"T. Kasai , G. Lee , H. Arimura , S. Arikawa , and K. Park . 2001. Linear-time longest-common-prefix computation in suffix arrays and its applications . In Combinatorial Pattern Matching. Lecture Notes in Computer Science , Vol. 2089 . Springer, 181--192. T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. 2001. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Combinatorial Pattern Matching. Lecture Notes in Computer Science, Vol. 2089. Springer, 181--192."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-23826-5_9"},{"key":"e_1_2_1_25_1","first-page":"1","article-title":"Compact suffix array\u2014A space efficient full-text index","volume":"56","author":"M\u00e4kinen V.","year":"2003","unstructured":"V. M\u00e4kinen . 2003 . Compact suffix array\u2014A space efficient full-text index . Fundamenta Informaticae 56 , 1 -- 2 (2003), 191--210. V. M\u00e4kinen. 2003. Compact suffix array\u2014A space efficient full-text index. Fundamenta Informaticae 56, 1--2 (2003), 191--210.","journal-title":"Fundamenta Informaticae"},{"key":"e_1_2_1_26_1","doi-asserted-by":"crossref","unstructured":"V. M\u00e4kinen D. Belazzougui F. Cunial and A. I. Tomescu. 2015. Genome-Scale Algorithm Design: Biological Sequence Analysis in the Era of High-Throughput Sequencing. Cambridge University Press.  V. M\u00e4kinen D. Belazzougui F. Cunial and A. I. Tomescu. 2015. Genome-Scale Algorithm Design: Biological Sequence Analysis in the Era of High-Throughput Sequencing. Cambridge University Press.","DOI":"10.1017\/CBO9781139940023"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1137\/0222058"},{"key":"e_1_2_1_28_1","series-title":"Lecture Notes in Computer Science","volume-title":"Algorithm Theory","author":"Manzini G.","unstructured":"G. Manzini . 2004. Two space saving tricks for linear time LCP array computation . In Algorithm Theory . Lecture Notes in Computer Science , Vol. 3111 . Springer , 372--383. G. Manzini. 2004. Two space saving tricks for linear time LCP array computation. In Algorithm Theory. Lecture Notes in Computer Science, Vol. 3111. Springer, 372--383."},{"key":"e_1_2_1_29_1","series-title":"Lecture Notes in Computer Science","volume-title":"Foundations of Software Technology and Theoretical Computer Science","author":"Munro I.","unstructured":"I. Munro . 1996. Tables . In Foundations of Software Technology and Theoretical Computer Science . Lecture Notes in Computer Science , Vol. 1180 . Springer , 37--42. I. Munro. 1996. Tables. In Foundations of Software Technology and Theoretical Computer Science. Lecture Notes in Computer Science, Vol. 1180. Springer, 37--42."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1216370.1216372"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2699665"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2518175"},{"key":"e_1_2_1_33_1","volume-title":"Bioinformatics Algorithms: Sequence Analysis, Genome Rearrangements, and Phylogenetic Reconstruction","author":"Ohlebusch E.","year":"2013","unstructured":"E. Ohlebusch . 2013 . Bioinformatics Algorithms: Sequence Analysis, Genome Rearrangements, and Phylogenetic Reconstruction . Oldenbusch Verlag . E. Ohlebusch. 2013. Bioinformatics Algorithms: Sequence Analysis, Genome Rearrangements, and Phylogenetic Reconstruction. Oldenbusch Verlag."},{"volume-title":"Proceedings of the 2007 Workshop on Algorithm Engineering and Experiments (ALENEX\u201907)","author":"Okanohara D.","key":"e_1_2_1_34_1","unstructured":"D. Okanohara and K. Sadakane . 2007. Practical entropy-compressed rank\/select dictionary . In Proceedings of the 2007 Workshop on Algorithm Engineering and Experiments (ALENEX\u201907) . D. Okanohara and K. Sadakane. 2007. Practical entropy-compressed rank\/select dictionary. In Proceedings of the 2007 Workshop on Algorithm Engineering and Experiments (ALENEX\u201907)."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-92182-0_14"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA\u201902)","author":"Sadakane K.","year":"2002","unstructured":"K. Sadakane . 2002 . Succinct representations of LCP information and improvements in the compressed suffix arrays . In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA\u201902) . ACM, New York, NY, 225--232. K. Sadakane. 2002. Succinct representations of LCP information and improvements in the compressed suffix arrays. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA\u201902). ACM, New York, NY, 225--232."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.37"},{"key":"e_1_2_1_38_1","series-title":"Lecture Notes in Computer Science","volume-title":"Combinatorial Pattern Matching","author":"Sir\u00e9n J.","unstructured":"J. Sir\u00e9n . 2010. Sampled longest common prefix array . In Combinatorial Pattern Matching . Lecture Notes in Computer Science , Vol. 6129 . Springer , 227--237. J. Sir\u00e9n. 2010. Sampled longest common prefix array. In Combinatorial Pattern Matching. Lecture Notes in Computer Science, Vol. 6129. Springer, 227--237."},{"key":"e_1_2_1_39_1","series-title":"Lecture Notes in Computer Science","volume-title":"String Processing and Information Retrieval","author":"Tischler G.","unstructured":"G. Tischler . 2016. Low space external memory construction of the succinct permuted longest common prefix array . In String Processing and Information Retrieval . Lecture Notes in Computer Science , Vol. 9954 . Springer , 178--190. G. Tischler. 2016. Low space external memory construction of the succinct permuted longest common prefix array. In String Processing and Information Retrieval. Lecture Notes in Computer Science, Vol. 9954. Springer, 178--190."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1561\/0400000014"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/42.3.193"}],"container-title":["ACM Journal of Experimental Algorithmics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3297723","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3297723","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:54:10Z","timestamp":1750204450000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3297723"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,2,14]]},"references-count":40,"alternative-id":["10.1145\/3297723"],"URL":"https:\/\/doi.org\/10.1145\/3297723","relation":{},"ISSN":["1084-6654","1084-6654"],"issn-type":[{"type":"print","value":"1084-6654"},{"type":"electronic","value":"1084-6654"}],"subject":[],"published":{"date-parts":[[2019,2,14]]}}}