{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T04:33:16Z","timestamp":1768105996638,"version":"3.49.0"},"reference-count":34,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2021,4,23]],"date-time":"2021-04-23T00:00:00Z","timestamp":1619136000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF-1703489"],"award-info":[{"award-number":["CCF-1703489"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Finding substrings of a text T that match a regular expression p is a fundamental problem. Despite being the subject of extensive research, no solution with a time complexity significantly better than O(|T||p|) has been found. Backurs and Indyk in FOCS 2016 established conditional lower bounds for the algorithmic problem based on the Strong Exponential Time Hypothesis that helps explain this difficulty. A natural question is whether we can improve the time complexity for matching the regular expression by preprocessing the text T? We show that conditioned on the Online Matrix\u2013Vector Multiplication (OMv) conjecture, even with arbitrary polynomial preprocessing time, a regular expression query on a text cannot be answered in strongly sublinear time, i.e., O(|T|1\u2212\u03b5) for any \u03b5&gt;0. Furthermore, if we extend the OMv conjecture to a plausible conjecture regarding Boolean matrix multiplication with polynomial preprocessing time, which we call Online Matrix\u2013Matrix Multiplication (OMM), we can strengthen this hardness result to there being no solution with a query time that is O(|T|3\/2\u2212\u03b5). These results hold for alphabet sizes three or greater. We then provide data structures that answer queries in O(|T||p|\u03c4) time where \u03c4\u2208[1,|T|] is fixed at construction. These include a solution that works for all regular expressions with Exp\u03c4\u00b7|T| preprocessing time and space. For patterns containing only \u2018concatenation\u2019 and \u2018or\u2019 operators (the same type used in the hardness result), we provide (1) a deterministic solution which requires Exp\u03c4\u00b7|T|log2|T| preprocessing time and space, and (2) when |p|\u2264|T|z for z=2o(log|T|), a randomized solution with amortized query time which answers queries correctly with high probability, requiring Exp\u03c4\u00b7|T|2\u03a9log|T| preprocessing time and space.<\/jats:p>","DOI":"10.3390\/a14050133","type":"journal-article","created":{"date-parts":[[2021,4,23]],"date-time":"2021-04-23T12:08:30Z","timestamp":1619179710000},"page":"133","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Text Indexing for Regular Expression Matching"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1493-5432","authenticated-orcid":false,"given":"Daniel","family":"Gibney","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA"}]},{"given":"Sharma V.","family":"Thankachan","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA"}]}],"member":"1968","published-online":{"date-parts":[[2021,4,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Banchs, R.E. (2012). Text Mining with MATLAB\u00ae, Springer Science & Business Media.","DOI":"10.1007\/978-1-4614-4151-9"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Chapman, C., and Stolee, K.T. (2016, January 18\u201320). Exploring regular expression usage and context in Python. Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA 2016), Saarbr\u00fccken, Germany.","DOI":"10.1145\/2931037.2931073"},{"key":"ref_3","unstructured":"Donovan, A.A., and Kernighan, B.W. (2015). The Go Programming Language, Addison-Wesley Professional."},{"key":"ref_4","unstructured":"Friedl, J.E.F. (2006). Mastering Regular Expressions\u2014Understand Your Data and Be more Productive: For Perl, PHP, Java, .NET, Ruby, O\u2019Reilly Media."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Goyvaerts, J., and Levithan, S. (2012). Regular Expressions Cookbook\u2014Detailed Solutions in Eight Programming Languages, O\u2019Reilly Media. [2nd ed.].","DOI":"10.1016\/S1353-4858(12)70100-9"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1002\/spe.841","article-title":"FIRE\/J\u2014Optimizing regular expression searches with generative programming","volume":"38","author":"Karakoidas","year":"2008","journal-title":"Softw. Pract. Exp."},{"key":"ref_7","unstructured":"Cox, R. (2021, March 03). Regular Expression Matching with a Trigram Index or How Google Code Search Worked. Available online: https:\/\/swtch.com\/~rsc\/regexp\/regexp4.html."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1009","DOI":"10.1109\/32.489076","article-title":"On \u201cA Framework for Source Code Search Using Program Patterns\u201d","volume":"21","author":"Devanbu","year":"1995","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Umarji, M., and Sim, S.E. (2008). Archetypal Internet-Scale Source Code Searching. Finding Source Code on the Web for Remix and Reuse, Springer.","DOI":"10.1007\/978-0-387-09684-1_21"},{"key":"ref_10","unstructured":"Chodorow, K., and Dirolf, M. (2010). MongoDB\u2014The Definitive Guide: Powerful and Scalable Data Storage, O\u2019Reilly Media."},{"key":"ref_11","unstructured":"Garofalakis, M.N., Rastogi, R., and Shim, K. (1999, January 7\u201310). SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. Proceedings of the 25th International Conference on Very Large Data Bases (VLDB\u201999), Edinburgh, Scotland, UK."},{"key":"ref_12","unstructured":"Obe, R., and Hsu, L. (2012). PostgreSQL\u2014Up and Running: A Practical Guide to the Advanced Open Source Database, O\u2019Reilly Media."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Backurs, A., and Indyk, P. (2016, January 9\u201311). Which Regular Expression Patterns Are Hard to Match?. Proceedings of the 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), New Brunswick, NJ, USA.","DOI":"10.1109\/FOCS.2016.56"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Henzinger, M., Krinninger, S., Nanongkai, D., and Saranurak, T. (2015, January 14\u201317). Unifying and Strengthening Hardness for Dynamic Problems via the Online Matrix-Vector Multiplication Conjecture. Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing (STOC 2015), Portland, OR, USA.","DOI":"10.1145\/2746539.2746609"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Abboud, A., and Dahlgaard, S. (2016, January 9\u201311). Popular Conjectures as a Barrier for Dynamic Planar Graph Algorithms. Proceedings of the IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS 2016), New Brunswick, NJ, USA.","DOI":"10.1109\/FOCS.2016.58"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Berkholz, C., Keppeler, J., and Schweikardt, N. (2017, January 14\u201319). Answering Conjunctive Queries under Updates. Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2017), Chicago, IL, USA.","DOI":"10.1145\/3034786.3034789"},{"key":"ref_17","unstructured":"Berkholz, C., Keppeler, J., and Schweikardt, N. (2018, January 26\u201329). Answering UCQs under Updates and in the Presence of Integrity Constraints. Proceedings of the 21st International Conference on Database Theory (ICDT 2018), Vienna, Austria."},{"key":"ref_18","unstructured":"Henzinger, M., and Neumann, S. (2016, January 22\u201324). Incremental and Fully Dynamic Subgraph Connectivity For Emergency Planning. Proceedings of the 24th Annual European Symposium on Algorithms (ESA 2016), Aarhus, Denmark."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"van den Brand, J., Nanongkai, D., and Saranurak, T. (2019, January 9\u201312). Dynamic Matrix Inverse: Improved Algorithms and Matching Conditional Lower Bounds. Proceedings of the 60th IEEE Annual Symposium on Foundations of Computer Science (FOCS 2019), Baltimore, MD, USA.","DOI":"10.1109\/FOCS.2019.00036"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Larsen, K.G., and Williams, R.R. (2017, January 16\u201319). Faster Online Matrix-Vector Multiplication. Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2017), Barcelona, Spain.","DOI":"10.1137\/1.9781611974782.142"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Blelloch, G.E., Vassilevska, V., and Williams, R. (2008). A New Combinatorial Approach for Sparse Graph Problems. Automata, Languages and Programming, Proceedings of the 35th International Colloquium (ICALP 2008), Reykjavik, Iceland, 7\u201311 July 2008, Springer. Part I: Tack A: Algorithms, Automata, Complexity, and Games.","DOI":"10.1007\/978-3-540-70575-8_10"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1145\/1077464.1077466","article-title":"Fast sparse matrix multiplication","volume":"1","author":"Yuster","year":"2005","journal-title":"ACM Trans. Algorithms"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Aho, A.V. (1990). Algorithms for Finding Patterns in Strings. Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity, Elsevier.","DOI":"10.1016\/B978-0-444-88071-0.50010-2"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Tsang, D., and Chawla, S. (2011). An index for regular expression queries: Design and implementation. arXiv.","DOI":"10.1145\/2063576.2063968"},{"key":"ref_25","unstructured":"Cho, J., and Rajagopalan, S. (March, January 26). A Fast Regular Expression Indexing Engine. Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Bille, P. (2006). New Algorithms for Regular Expression Matching. Automata, Languages and Programming, Proceedings of the 33rd International Colloquium (ICALP 2006), Venice, Italy, 10\u201314 July 2006, Springer. Part I.","DOI":"10.1007\/11786986_56"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1145\/363347.363387","article-title":"Regular Expression Search Algorithm","volume":"11","author":"Thompson","year":"1968","journal-title":"Commun. ACM"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"430","DOI":"10.1145\/128749.128755","article-title":"A Four Russians Algorithm for Regular Expression Pattern Matching","volume":"39","author":"Myers","year":"1992","journal-title":"J. ACM"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Bille, P., and Thorup, M. (2009). Faster Regular Expression Matching. Automata, Languages and Programming, Proceedings of the 36th International Colloquium (ICALP 2009), Rhodes, Greece, 5\u201312 July 2009, Springer. Part I.","DOI":"10.1007\/978-3-642-02927-1_16"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Bille, P., and Thorup, M. (2010, January 17\u201319). Regular Expression Matching with Multi-Strings and Intervals. Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2010), Austin, TX, USA.","DOI":"10.1137\/1.9781611973075.104"},{"key":"ref_31","unstructured":"Cox, R. (2021, March 03). Regular Expression Matching Can Be Simple and Fast (but Is Slow in java, perl, php, python, ruby,...). Available online: http:\/\/swtch.com\/rsc\/regexp\/regexp1.html."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Bringmann, K., Gr\u00f8nlund, A., and Larsen, K.G. (2017, January 15\u201317). A Dichotomy for Regular Expression Membership Testing. Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science (FOCS 2017), Berkeley, CA, USA.","DOI":"10.1109\/FOCS.2017.36"},{"key":"ref_33","unstructured":"Goldstein, I., Kopelowitz, T., Lewenstein, M., and Porat, E. (2016, January 22\u201324). How Hard is it to Find (Honest) Witnesses?. Proceedings of the 24th Annual European Symposium on Algorithms (ESA 2016), Aarhus, Denmark."},{"key":"ref_34","unstructured":"Williams, R. (2007, January 7\u20139). Matrix-vector multiplication in sub-quadratic time: (Some preprocessing required). Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2007), New Orleans, LA, USA."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/5\/133\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:52:02Z","timestamp":1760161922000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/5\/133"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,23]]},"references-count":34,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2021,5]]}},"alternative-id":["a14050133"],"URL":"https:\/\/doi.org\/10.3390\/a14050133","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,23]]}}}