{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,7]],"date-time":"2026-02-07T16:02:51Z","timestamp":1770480171308,"version":"3.49.0"},"reference-count":33,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2009,11,25]],"date-time":"2009-11-25T00:00:00Z","timestamp":1259107200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>We consider grammar-based text compression with longest first substitution (LFS), where non-overlapping occurrences of a longest repeating factor of the input text are replaced by a new non-terminal symbol. We present the first linear-time algorithm for LFS. Our algorithm employs a new data structure called sparse lazy suffix trees. We also deal with a more sophisticated version of LFS, called LFS2, that allows better compression. The first linear-time algorithm for LFS2 is also presented.<\/jats:p>","DOI":"10.3390\/a2041429","type":"journal-article","created":{"date-parts":[[2009,11,25]],"date-time":"2009-11-25T16:11:30Z","timestamp":1259165490000},"page":"1429-1448","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Linear-Time Text Compression by Longest-First Substitution"],"prefix":"10.3390","volume":"2","author":[{"given":"Ryosuke","family":"Nakamura","sequence":"first","affiliation":[{"name":"Department of Informatics, Kyushu University, 744 Motooka, Fukuoka 819-0395, Japan"}]},{"given":"Shunsuke","family":"Inenaga","sequence":"additional","affiliation":[{"name":"Graduate School of Information Science and Electrical Engineering, Kyushu University, 744 Motooka, Fukuoka 819-0395, Japan"}]},{"given":"Hideo","family":"Bannai","sequence":"additional","affiliation":[{"name":"Department of Informatics, Kyushu University, 744 Motooka, Fukuoka 819-0395, Japan"}]},{"given":"Takashi","family":"Funamoto","sequence":"additional","affiliation":[{"name":"Department of Informatics, Kyushu University, 744 Motooka, Fukuoka 819-0395, Japan"}]},{"given":"Masayuki","family":"Takeda","sequence":"additional","affiliation":[{"name":"Department of Informatics, Kyushu University, 744 Motooka, Fukuoka 819-0395, Japan"}]},{"given":"Ayumi","family":"Shinohara","sequence":"additional","affiliation":[{"name":"Graduate School of Information Sciences, Tohoku University, Aoba 6-6-05, Aramaki, Sendai 980-8579, Japan"}]}],"member":"1968","published-online":{"date-parts":[[2009,11,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1016\/S0304-3975(02)00426-7","article-title":"Collage system: a unifying framework for compressed pattern matching","volume":"298","author":"Kida","year":"2003","journal-title":"Theoretical Computer Science"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1007\/s00453-002-1005-2","article-title":"Approximate Matching of Run-Length Compressed Strings","volume":"35","author":"Ukkonen","year":"2003","journal-title":"Algorithmica"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1007\/978-3-540-73437-6_24","article-title":"Processing Compressed Texts: A Tractability Border","volume":"Vol. 4580","author":"Lifshits","year":"2007","journal-title":"Proc. 18th Annual Symposium on Combinatorial Pattern Matching (CPM\u201907)"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"900","DOI":"10.1016\/j.tcs.2008.12.016","article-title":"Efficient Algorithms to Compute Compressed Longest Common Substrings and Compressed Palindromes","volume":"410","author":"Matsubara","year":"2009","journal-title":"Theoretical Computer Science"},{"key":"ref_5","unstructured":"Hermelin, D., Landau, G. M., Landau, S., and Weimann, O. (2009). Proc. 26th International Symposium on Theoretical Aspects of Computer Science (STACS\u201909)."},{"key":"ref_6","first-page":"19","article-title":"Testing Square-Freeness of Strings Compressed by Balanced Straight Line Program","volume":"Vol. 94","author":"Matsubara","year":"2009","journal-title":"Proc. 15th Computing: The Australasian Theory Symposium (CATS\u201909)"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1613\/jair.374","article-title":"Identifying hierarchical structure in sequences: a linear-time algorithm","volume":"7","author":"Witten","year":"1997","journal-title":"J. Artificial Intelligence Research"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1745","DOI":"10.1109\/5.892710","article-title":"Online and offline heuristics for inferring hierarchies of repetitions in sequences","volume":"88","author":"Witten","year":"2000","journal-title":"Proc. IEEE"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1093\/bioinformatics\/btp117","article-title":"Textual data compression in computational biology: a synopsis","volume":"25","author":"Giancarlo","year":"2009","journal-title":"Bioinformatics"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1109\/18.841160","article-title":"Grammar-based codes: A new class of universal lossless source codes","volume":"46","author":"Kieffer","year":"2000","journal-title":"IEEE Transactions on Information Theory"},{"key":"ref_11","unstructured":"Storer, J. (1977). NP-completeness Results Concerning Data Compression, Technical Report 234, Department of Electrical Engineering and Computer Science, Princeton University."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"530","DOI":"10.1109\/TIT.1978.1055934","article-title":"Compression of individual sequences via variable-rate coding","volume":"24","author":"Ziv","year":"1978","journal-title":"IEEE Trans. Information Theory"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1109\/MC.1984.1659158","article-title":"A Technique for High-Performance Data Compression","volume":"17","author":"Welch","year":"1984","journal-title":"IEEE Computer"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1227","DOI":"10.1109\/18.850665","article-title":"Universal lossless compression via multilevel pattern matching","volume":"46","author":"Kieffer","year":"2000","journal-title":"IEEE Transactions on Information Theory"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"416","DOI":"10.1016\/j.jda.2004.08.016","article-title":"A fully linear-time approximation algorithm for grammar-based compression","volume":"3","author":"Sakamoto","year":"2005","journal-title":"Journal of Discrete Algorithms"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1016\/S0304-3975(02)00777-6","article-title":"Application of Lempel-Ziv factorization to the approximation of grammar-based compression","volume":"302","author":"Rytter","year":"2003","journal-title":"Theoretical Computer Science"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"158","DOI":"10.1587\/transinf.E92.D.158","article-title":"A Space-Saving Approximation Algorithm for Grammar-Based Compression","volume":"E92-D","author":"Sakamoto","year":"2009","journal-title":"IEICE Trans. on Information and Systems"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1007\/978-3-540-89097-3_5","article-title":"Context-Sensitive Grammar Transform: Compression and Pattern Matching","volume":"Vol. 5280","author":"Maruyama","year":"2008","journal-title":"Proc. 15th International Symposium on String Processing and Information Retrieval (SPIRE\u201908)"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1111\/j.2044-8295.1975.tb01442.x","article-title":"An algorithm for the segmentation for an artificial language analogue","volume":"66","author":"Wolff","year":"1975","journal-title":"British Journal of Psychology"},{"key":"ref_20","unstructured":"Larsson, N. J., and Moffat, A. (1999). Proc. Data Compression Conference \u201999 (DCC\u201999), IEEE Computer Society."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1733","DOI":"10.1109\/5.892709","article-title":"Off-Line Compression by Greedy Textual Substitution","volume":"88","author":"Apostolico","year":"2000","journal-title":"Proc. IEEE"},{"key":"ref_22","unstructured":"Apostolico, A., and Lonardi, S. (2000). Proc. Data Compression Conference \u201900 (DCC\u201900), IEEE Computer Society."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1109\/TIT.1977.1055714","article-title":"A Universal Algorithm for Sequential Data Compression","volume":"IT-23","author":"Ziv","year":"1977","journal-title":"IEEE Transactions on Information Theory"},{"key":"ref_24","unstructured":"Burrows, M., and Wheeler, D. (1994). A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation."},{"key":"ref_25","unstructured":"Nakamura, R., Bannai, H., Inenaga, S., and Takeda, M. (2007). Proc. Data Compression Conference \u201907 (DCC\u201907), IEEE Computer Society."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1007\/978-3-540-39984-1_11","article-title":"Linear-time off-line text compression by longest-first substitution","volume":"Vol. 2857","author":"Inenaga","year":"2003","journal-title":"Proc. 10th International Symposium on String Processing and Information Retrieval (SPIRE\u201903)"},{"key":"ref_27","unstructured":"Bentley, J., and McIlroy, D. (1999). Proc. Data Compression Conference \u201999 (DCC\u201999), IEEE Computer Society."},{"key":"ref_28","unstructured":"Lanctot, J. K., Li, M., and Yang, E.-H. (2000). Proc. 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA\u201900)."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1007\/BF01206331","article-title":"On-line Construction of Suffix Trees","volume":"14","author":"Ukkonen","year":"1995","journal-title":"Algorithmica"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1007\/3-540-61332-3_155","article-title":"Sparse Suffix Trees","volume":"Vol. 1090","author":"Ukkonen","year":"1996","journal-title":"Proc. 2nd Annual International Computing and Combinatorics Conference (COCOON\u201996)"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1007\/BF01955046","article-title":"Data structures and algorithms for the string statistics problem","volume":"15","author":"Apostolico","year":"1996","journal-title":"Algorithmica"},{"key":"ref_32","first-page":"728","article-title":"Solving the String Stastistics Problem in Time O(n log n)","volume":"Vol. 2380","author":"Pedersen","year":"2002","journal-title":"Proc. 29th International Colloquium on Automata,Languages, and Programming (ICALP\u201902)"},{"key":"ref_33","unstructured":"Lanctot, J. K. (2004). Some String Problems in Computational Biology. [PhD thesis, University ofWaterloo]."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/2\/4\/1429\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T22:11:46Z","timestamp":1760220706000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/2\/4\/1429"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,11,25]]},"references-count":33,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2009,12]]}},"alternative-id":["a2041429"],"URL":"https:\/\/doi.org\/10.3390\/a2041429","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,11,25]]}}}