{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,6]],"date-time":"2026-02-06T22:08:20Z","timestamp":1770415700487,"version":"3.49.0"},"reference-count":36,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2012,4,10]],"date-time":"2012-04-10T00:00:00Z","timestamp":1334016000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Grammar-based compression is a well-studied technique to construct a context-free grammar (CFG) deriving a given text uniquely. In this work, we propose an online algorithm for grammar-based compression. Our algorithm guarantees O(log2 n)- approximation ratio for the minimum grammar size, where n is an input size, and it runs in input linear time and output linear space. In addition, we propose a practical encoding, which transforms a restricted CFG into a more compact representation. Experimental results by comparison with standard compressors demonstrate that our algorithm is especially effective for highly repetitive text.<\/jats:p>","DOI":"10.3390\/a5020214","type":"journal-article","created":{"date-parts":[[2012,4,10]],"date-time":"2012-04-10T11:09:50Z","timestamp":1334056190000},"page":"214-235","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":26,"title":["An Online Algorithm for Lightweight Grammar-Based Compression"],"prefix":"10.3390","volume":"5","author":[{"given":"Shirou","family":"Maruyama","sequence":"first","affiliation":[{"name":"Department of Informatics, Kyushu University, 744 Motooka Nishi-ku Fukuoka-shi, Fukuoka, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3470-9187","authenticated-orcid":false,"given":"Hiroshi","family":"Sakamoto","sequence":"additional","affiliation":[{"name":"Graduate School of Computer Science and Systems Engineering, Kyushu Institute of Technology,680-4 Kawazu Iizuka-shi Fukuoka, Japan"}]},{"given":"Masayuki","family":"Takeda","sequence":"additional","affiliation":[{"name":"Department of Informatics, Kyushu University, 744 Motooka Nishi-ku Fukuoka-shi, Fukuoka, Japan"}]}],"member":"1968","published-online":{"date-parts":[[2012,4,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1109\/18.841160","article-title":"Grammar-based codes: A new class of universal lossless source codes","volume":"46","author":"Kieffer","year":"2000","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1109\/TIT.1977.1055714","article-title":"A universal algorithm for sequential data compression","volume":"23","author":"Ziv","year":"1977","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_3","unstructured":"Sir\u1ebdn, J., V\u00e4lim\u00e4ki, N., M\u00e4kinenn, V., and Navarro, G. (2008, January 10\u201312). Run-Length Compressed Indexes are Superior for Highly Repetitive Sequence Collections. Proceedings of the 15th International Symposium on String Processing and Information Retrievals (SPIRE \u201908), Melbourne, Australia."},{"key":"ref_4","unstructured":"Claude, F., Feri\u00f1a, A., Martinez-Prieto, M.A., and Navarro, G. (June, January 31). Compressed q-Gram Indexing for Highly Repetitive Biological Sequences. Proceedings of the 10th IEEE International Conference on Bioinformatics and Bioengineering (BIBE \u201910), Philadelphia, PA, USA."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Kreft, S., and Navarro, G. (2011, January 27\u201329). Self-Indexing Based on LZ77. Proceedings of the 22th Annual Symposium on Combinatorial Pattern Matching (CPM \u201911), Palermo, Italy.","DOI":"10.1007\/978-3-642-21458-5_6"},{"key":"ref_6","first-page":"172","article-title":"An efficient pattern matching algorithm strings with short descriptions","volume":"4","author":"Karpinski","year":"1997","journal-title":"Nordic J. Comput."},{"key":"ref_7","unstructured":"Miyazaki, M., Shinohara, A., and Takeda, M. (July, January 30). An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs. Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching (CPM \u201997), CPM 97, Aarhus, Denmark."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"C\u00e9gielski, P., Guessarian, I., Lifshits, Y., and Matiyasevich, Y. (2006, January 8\u201312). Window Subsequence Problems for Compressed Texts. Proceedings of the 1st International Computer Science Symposium in Russia (CSR \u201906), St. Petersburg, Russia.","DOI":"10.1007\/11753728_15"},{"key":"ref_9","unstructured":"Lifshits, Y. (2007, January 9\u201311). Processing Compressed Texts: A Tractability Border. Proceedings of the 18th Annual Symposium on Combinatorial Pattern Matching (CPM \u201907), London, Canada."},{"key":"ref_10","unstructured":"Tiskin, A. (2011, January 14\u201318). Towards Approximate Matching in Compressed Strings. Proceedings of the 6th International Computer Science Symposium in Russia (CSR \u201911), St. Petersburg, Russia."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Yamamoto, T., Bannai, H., Inenaga, S., and Takeda, M. (2011, January 27\u201329). Faster Subsequence and Don\u2019t-Care Pattern Matching on Compressed Texts. Proceedings of the 22th Annual Symposium on Combinatorial Pattern Matching (CPM \u201911), Palermo, Italy.","DOI":"10.1007\/978-3-642-21458-5_27"},{"key":"ref_12","unstructured":"Hermelin, D., Landau, G.M., Landau, S., and Weimann, O. (2009, January 26\u201328). A Unified Algorithm for Accelerating Edit-Distance Computation via Text-Compression. Proceedings of the 26th International Symposium on Theoretical Aspects of Computer Science (STACS \u201909), Freiburg, Germany."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Goto, K., Bannai, H., Inenaga, S., and Takeda, M. (2011, January 17\u201321). Fast q-Gram Mining on SLP Compressed Strings. Proceedings of the 18th International Symposium on String Processing and Information Retrieval (SPIRE \u201911), Pisa, Italy.","DOI":"10.1007\/978-3-642-24583-1_27"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Goto, K., Bannai, H., Inenaga, S., and Takeda, M. (2012, January 21\u201327). Computing q-Gram Non-Overlapping Frequencies on SLP Compressed Texts. Proceedings of the 38th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM \u201912), Spindlerv Ml\u00fdn, Czech Republic.","DOI":"10.1007\/978-3-642-27660-6_25"},{"key":"ref_15","unstructured":"Inenaga, S., and Bannai, H. (2, January 31). Finding Characteristic Substrings from Compressed Texts. Proceedings of the Prague Stringology Conference (PSC \u201909), Prague, Czech Republic."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"900","DOI":"10.1016\/j.tcs.2008.12.016","article-title":"Efficient algorithms to compute compressed longest common substrings and compressed palindromes","volume":"410","author":"Matsubara","year":"2009","journal-title":"Theor. Comput. Sci."},{"key":"ref_17","unstructured":"Lehman, E., and Shelat, A. (2002, January 6\u20138). Approximation Algorithms for Grammar-Based Compression. Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA \u201902), San Francisco, CA, USA."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1613\/jair.374","article-title":"Identifying hierarchical strcture in sequences: A linear-time algorithm","volume":"7","author":"Witten","year":"1997","journal-title":"J. Artif. Intell. Res."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1722","DOI":"10.1109\/5.892708","article-title":"Off-line dictionary-based compression","volume":"88","author":"Larsson","year":"2000","journal-title":"Proc. IEEE"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1733","DOI":"10.1109\/5.892709","article-title":"Offline compression by greedy textual substitution","volume":"88","author":"Apostolico","year":"2000","journal-title":"Proc. IEEE"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1429","DOI":"10.3390\/a2041429","article-title":"Linear-time off-line text compression by longest-first substitution","volume":"2","author":"Nakamura","year":"2009","journal-title":"Algorithms"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2554","DOI":"10.1109\/TIT.2005.850116","article-title":"The smallest grammar problem","volume":"51","author":"Charikar","year":"2005","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1016\/S0304-3975(02)00777-6","article-title":"Application of Lempel-Ziv factorization to the approximation of grammar-based compression","volume":"302","author":"Rytter","year":"2003","journal-title":"Theor. Comput. Sci."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"416","DOI":"10.1016\/j.jda.2004.08.016","article-title":"A fully linear-time approximation algorithm for grammar-based compression","volume":"3","author":"Sakamoto","year":"2005","journal-title":"J. Discret. Algorithms"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Sakamoto, H., Kida, T., and Shimozono, S. (2004, January 5\u20138). A Space-Saving Linear-Time Algorithm for Grammar-Based Compression. Proceedings of the 11th International Symposium on String Processing and Information Retrieval (SPIRE \u201904), Padova, Italy.","DOI":"10.1007\/978-3-540-30213-1_33"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"158","DOI":"10.1587\/transinf.E92.D.158","article-title":"A space-saving approximation algorithm for grammar-based compression","volume":"E92-D","author":"Sakamoto","year":"2009","journal-title":"IEICE Trans. Inf. Syst."},{"key":"ref_27","unstructured":"Gagie, T., and Gawrychowski, P. (2010, January 24\u201328). Grammar-Based Compression in a Streaming Model. Proceedings of the 4th International Conference on Language and Automata Theory and Applications (LATA \u201910), Trier, Germany."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Gusfield, D. (1997). Algorithms on Strings, Trees, and Sequences, Cambridge University Press.","DOI":"10.1017\/CBO9780511574931"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Dietzfelbinger, M., Karlin, A., Mehlhorn, K., auf der Heide, F.M., Rohnert, H., and Tarjan, R.E. (1988, January 24\u201326). Dynamic Perfect Hashing: Upper and Lower Bounds. Proceedings of the 29th Annual Symposium on Foundations of Computer Science (FOCS \u201988), White Plains, New York, USA.","DOI":"10.1109\/SFCS.1988.21968"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Welch, T. (1984). A technique for high-performance data compression. IEEE Comput., 8\u201319.","DOI":"10.1109\/MC.1984.1659158"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"530","DOI":"10.1109\/TIT.1978.1055934","article-title":"Compression of individual sequences via variable-rate coding","volume":"24","author":"Ziv","year":"1978","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_32","unstructured":"Burrows, M., and Wheeler, D. A Block Sorting Lossless Data Compression Algorithm; Technical Report 124, Digital Equipment Corporation, 1994."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1016\/S0304-3975(02)00426-7","article-title":"Collage systems: A unifying framework for compressed pattern matching","volume":"298","author":"Kida","year":"2003","journal-title":"Theor. Comput. Sci."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"313","DOI":"10.3233\/FI-2011-565","article-title":"Self-Indexed Grammar-Based Compression","volume":"3","author":"Claude","year":"2011","journal-title":"Fundam. Inform."},{"key":"ref_35","unstructured":"Gagie, T., Gawrychowski, P., and Puglisi, S.J. (2012, January 5\u20139). Faster Grammar-Based Self-Index. Proceedings of the 6th International Conference on Language and Automata Theory and Applications (LATA \u201912), A Coru\u00f1a, Spain."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., and Weimann, O. (2011, January 23\u201325). Random Access to Grammar-Compressed Strings. Proceedings of the 22th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA \u201911), San Francisco, California, USA.","DOI":"10.1137\/1.9781611973082.30"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/5\/2\/214\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,6]],"date-time":"2026-02-06T11:19:07Z","timestamp":1770376747000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/5\/2\/214"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,4,10]]},"references-count":36,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2012,6]]}},"alternative-id":["a5020214"],"URL":"https:\/\/doi.org\/10.3390\/a5020214","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,4,10]]}}}