{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,14]],"date-time":"2025-10-14T00:48:40Z","timestamp":1760402920955,"version":"build-2065373602"},"reference-count":29,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2020,4,23]],"date-time":"2020-04-23T00:00:00Z","timestamp":1587600000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>This study presents an analysis of RePair, which is a grammar compression algorithm known for its simple scheme, while also being practically effective. First, we show that the main process of RePair, that is, the step by step substitution of the most frequent symbol pairs, works within the corresponding most frequent maximal repeats. Then, we reveal the relation between maximal repeats and grammars constructed by RePair. On the basis of this analysis, we further propose a novel variant of RePair, called MR-RePair, which considers the one-time substitution of the most frequent maximal repeats instead of the consecutive substitution of the most frequent pairs. The results of the experiments comparing the size of constructed grammars and execution time of RePair and MR-RePair on several text corpora demonstrate that MR-RePair constructs more compact grammars than RePair does, especially for highly repetitive texts.<\/jats:p>","DOI":"10.3390\/a13040103","type":"journal-article","created":{"date-parts":[[2020,4,23]],"date-time":"2020-04-23T10:46:22Z","timestamp":1587638782000},"page":"103","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Practical Grammar Compression Based on Maximal Repeats"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1918-5802","authenticated-orcid":false,"given":"Isamu","family":"Furuya","sequence":"first","affiliation":[{"name":"Graduate School of IST, Hokkaido University, N14-W9, Kitaku, Sapporo 060-0814, Japan"}]},{"given":"Takuya","family":"Takagi","sequence":"additional","affiliation":[{"name":"Graduate School of IST, Hokkaido University, N14-W9, Kitaku, Sapporo 060-0814, Japan"}]},{"given":"Yuto","family":"Nakashima","sequence":"additional","affiliation":[{"name":"Department of Informatics, Kyushu University, 744, Motooka, Nishiku, Fukuoka 819-0395, Japan"}]},{"given":"Shunsuke","family":"Inenaga","sequence":"additional","affiliation":[{"name":"Department of Informatics, Kyushu University, 744, Motooka, Nishiku, Fukuoka 819-0395, Japan"},{"name":"PRESTO, Japan Science and Technology Agency, 4-1-8, Honcho, Kawaguchi, Saitama 332-0012, Japan"}]},{"given":"Hideo","family":"Bannai","sequence":"additional","affiliation":[{"name":"Department of Informatics, Kyushu University, 744, Motooka, Nishiku, Fukuoka 819-0395, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1666-3303","authenticated-orcid":false,"given":"Takuya","family":"Kida","sequence":"additional","affiliation":[{"name":"Graduate School of IST, Hokkaido University, N14-W9, Kitaku, Sapporo 060-0814, Japan"}]}],"member":"1968","published-online":{"date-parts":[[2020,4,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"2554","DOI":"10.1109\/TIT.2005.850116","article-title":"The smallest grammar problem","volume":"51","author":"Charikar","year":"2005","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1722","DOI":"10.1109\/5.892708","article-title":"Off-line dictionary-based compression","volume":"88","author":"Larsson","year":"2000","journal-title":"Proc. IEEE"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"16:1","DOI":"10.1145\/1841909.1841913","article-title":"Fast and compact web graph representations","volume":"4","author":"Claude","year":"2010","journal-title":"ACM Trans. Web"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1007\/978-3-540-73437-6_23","article-title":"Compressed text indexes with fast locate","volume":"Volume 4580","author":"Navarro","year":"2007","journal-title":"Annual Symposium on Combinatorial Pattern Matching (CPM 2007)"},{"key":"ref_5","unstructured":"Wan, R. (2003). Browsing and Searching Compressed Documents. [Ph.D. Thesis, The University of Melbourne]."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Masaki, T., and Kida, T. (April, January 29). Online Grammar Transformation Based on Re-Pair Algorithm. Proceedings of the Data Compression Conference (2016 DCC), Snowbird, UT, USA.","DOI":"10.1109\/DCC.2016.69"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Bille, P., G\u00f8rtz, I.L., and Prezza, N. (2017, January 4\u20137). Space-Efficient Re-Pair Compression. Proceedings of the Data Compression Conference (DCC 2017), Snowbird, UT, USA.","DOI":"10.1109\/DCC.2017.24"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Sekine, K., Sasakawa, H., Yoshida, S., and Kida, T. (2014, January 26\u201328). Adaptive dictionary sharing method for Re-Pair algorithm. Proceedings of the Data Compression Conference (DCC 2014), Snowbird, UT, USA.","DOI":"10.1109\/DCC.2014.73"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1150","DOI":"10.1016\/j.is.2013.06.006","article-title":"XML tree structure compression using RePair","volume":"38","author":"Lohrey","year":"2013","journal-title":"Inf. Syst."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Tabei, Y., Saigo, H., Yamanishi, Y., and Puglisi, S.J. (2016, January 13\u201317). Scalable partial least squares regression on grammar-compressed data matrices. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939864"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Navarro, G., and Russo, L.M. (2008, January 25\u201327). Re-pair Achieves High-Order Entropy. Proceedings of the Data Compression Conference (DCC 2008), Snowbird, UT, USA.","DOI":"10.1109\/DCC.2008.79"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Ochoa, C., and Navarro, G. (2018). RePair and All Irreducible Grammars are Upper Bounded by High-Order Empirical Entropy. IEEE Trans. Inf. Theory, 1\u20135.","DOI":"10.1109\/TIT.2018.2871452"},{"key":"ref_13","first-page":"26","article-title":"Composite Repetition-Aware Data Structures","volume":"Volume 9133","author":"Belazzougui","year":"2015","journal-title":"Annual Symposium on Combinatorial Pattern Matching (CPM 2015)"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1007\/978-3-319-67428-5_14","article-title":"Fast label extraction in the CDAWG","volume":"Volume 10508","author":"Belazzougui","year":"2017","journal-title":"International Symposium on String Processing and Information Retrieval (SPIRE 2017)"},{"key":"ref_15","first-page":"7:1","article-title":"Representing the Suffix Tree with the CDAWG","volume":"Volume 78","author":"Radoszewski","year":"2017","journal-title":"28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1007\/978-3-319-67428-5_26","article-title":"Linear-Size CDAWG: New Repetition-Aware Indexing and Grammar Compression","volume":"Volume 10508","author":"Takagi","year":"2017","journal-title":"24th International Symposium on String Processing and Information Retrieval (SPIRE 2017)"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1733","DOI":"10.1109\/5.892709","article-title":"Off-line compression by greedy textual substitution","volume":"88","author":"Apostolico","year":"2000","journal-title":"Proc. IEEE"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1007\/978-3-540-39984-1_11","article-title":"Linear-time off-line text compression by longest-first substitution","volume":"Volume 2857","author":"Inenaga","year":"2003","journal-title":"10th International Symposium on String Processing and Information Retrieval (SPIRE 2003)"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Nakamura, R., Bannai, H., Inenaga, S., and Takeda, M. (2007, January 27\u201329). Simple Linear-Time Off-Line Text Compression by Longest-First Substitution. Proceedings of the Data Compression Conference (DCC 2007), Snowbird, UT, USA.","DOI":"10.1109\/DCC.2007.70"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Ga\u0144czorz, M., and Je\u017c, A. (2017, January 4\u20137). Improvements on Re-Pair Grammar Compressor. Proceedings of the Data Compression Conference (DCC 2017), Snowbird, UT, USA.","DOI":"10.1109\/DCC.2017.52"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"313","DOI":"10.3233\/FI-2011-565","article-title":"Self-Indexed Grammar-Based Compression","volume":"111","author":"Claude","year":"2011","journal-title":"Fundam. Inform."},{"key":"ref_22","unstructured":"Claude, F., and Navarro, G. (2012, January 21\u201325). Improved Grammar-Based Compressed Indexes. Proceedings of the 19th International Symposium on String Processing and Information Retrieval (SPIRE 2012), Cartagena de Indias, Colombia."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Gawrychowski, P. (2012, January 21\u201325). Faster Algorithm for Computing the Edit Distance between SLP-Compressed Strings. Proceedings of the 19th International Symposium on String Processing and Information Retrieval (SPIRE 2012), Cartagena de Indias, Colombia.","DOI":"10.1007\/978-3-642-34109-0_24"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1016\/j.jda.2012.07.006","article-title":"Fast q-gram mining on SLP compressed strings","volume":"18","author":"Goto","year":"2013","journal-title":"J. Discret. Algorithms"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1016\/j.tcs.2014.07.010","article-title":"Compact Q-gram Profiling of Compressed Strings","volume":"550","author":"Bille","year":"2014","journal-title":"Theor. Comput. Sci."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1016\/j.tcs.2015.01.019","article-title":"Compressed automata for dictionary matching","volume":"578","author":"Tomohiro","year":"2015","journal-title":"Theor. Comput. Sci."},{"key":"ref_27","first-page":"20:1","article-title":"Faster Fully Compressed Pattern Matching by Recompression","volume":"11","year":"2015","journal-title":"ACM Trans. Algorithms"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1007\/s00453-015-0068-9","article-title":"Compressed Subsequence Matching and Packed Tree Coloring","volume":"77","author":"Bille","year":"2017","journal-title":"Algorithmica"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Furuya, I., Takagi, T., Nakashima, Y., Inenaga, S., Bannai, H., and Kida, T. (2019, January 26\u201329). MR-RePair: Grammar Compression based on Maximal Repeats. Proceedings of the Data Compression Conference (DCC 2019), Snowbird, UT, USA.","DOI":"10.1109\/DCC.2019.00059"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/4\/103\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T14:09:07Z","timestamp":1760364547000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/4\/103"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,4,23]]},"references-count":29,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2020,4]]}},"alternative-id":["a13040103"],"URL":"https:\/\/doi.org\/10.3390\/a13040103","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2020,4,23]]}}}