{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T02:18:55Z","timestamp":1773886735915,"version":"3.50.1"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA1","license":[{"start":{"date-parts":[[2023,4,6]],"date-time":"2023-04-06T00:00:00Z","timestamp":1680739200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["2008096"],"award-info":[{"award-number":["2008096"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2023,4,6]]},"abstract":"<jats:p>Regular expressions (regexes) are ubiquitous in modern software. There is a variety of implementation techniques for regex matching, which can be roughly categorized as (1) relying on backtracking search, or (2) being based on finite-state automata. The implementations that use backtracking are often chosen due to their ability to support advanced pattern-matching constructs. Unfortunately, they are known to suffer from severe performance problems. For some regular expressions, the running time for matching can be exponential in the size of the input text. In order to provide stronger guarantees of matching efficiency, automata-based regex matching is the preferred choice. However, even these regex engines may exhibit severe performance degradation for some patterns. The main reason for this is that regexes used in practice are not exclusively built from the classical regular constructs, i.e., concatenation, nondeterministic choice and Kleene's star. They involve additional constructs that provide succinctness and convenience of expression. The most common such construct is bounded repetition (also called counting), which describes the repetition of the pattern a fixed number of times.<\/jats:p>\n          <jats:p>In this paper, we propose a new algorithm for the efficient matching of regular expressions that involve bounded repetition. Our algorithms are based on a new model of automata, which we call nondeterministic bit vector automata (NBVA). This model is chosen to be expressively equivalent to nondeterministic counter automata with bounded counters, a very natural model for expressing patterns with bounded repetition. We show that there is a class of regular expressions with bounded repetition that can be matched in time that is independent from the repetition bounds. Our algorithms are general enough to cover the vast majority of challenging bounded repetitions that arise in practice. We provide an implementation of our approach in a regex engine, which we call BVA-Scan. We compare BVA-Scan against state-of-the-art regex engines on several real datasets.<\/jats:p>","DOI":"10.1145\/3586044","type":"journal-article","created":{"date-parts":[[2023,4,6]],"date-time":"2023-04-06T21:06:02Z","timestamp":1680815162000},"page":"492-521","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":20,"title":["Regular Expression Matching using Bit Vector Automata"],"prefix":"10.1145","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5444-5924","authenticated-orcid":false,"given":"Alexis","family":"Le Glaunec","sequence":"first","affiliation":[{"name":"Rice University, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0672-2998","authenticated-orcid":false,"given":"Lingkun","family":"Kong","sequence":"additional","affiliation":[{"name":"Rice University, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1209-7738","authenticated-orcid":false,"given":"Konstantinos","family":"Mamouras","sequence":"additional","affiliation":[{"name":"Rice University, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,4,6]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/360825.360855"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(95)00182-4"},{"key":"e_1_2_1_3_1","volume-title":"https:\/\/www.gnu.org\/software\/gawk\/ Accessed","author":"Awk GNU","year":"2023","unstructured":"GNU Awk . 2022. GNU Awk . https:\/\/www.gnu.org\/software\/gawk\/ Accessed : March 11, 2023 . GNU Awk. 2022. GNU Awk. https:\/\/www.gnu.org\/software\/gawk\/ Accessed: March 11, 2023."},{"key":"e_1_2_1_4_1","volume-title":"Back Reference in PCRE. https:\/\/www.pcre.org\/original\/doc\/html\/pcrepattern.html#SEC19 Accessed","year":"2023","unstructured":"Backreferences. 2022. Back Reference in PCRE. https:\/\/www.pcre.org\/original\/doc\/html\/pcrepattern.html#SEC19 Accessed : March 11, 2023 . Backreferences. 2022. Back Reference in PCRE. https:\/\/www.pcre.org\/original\/doc\/html\/pcrepattern.html#SEC19 Accessed: March 11, 2023."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/135239.135243"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-24622-0_5"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-75632-5_5"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1544012.1544037"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2006.270302"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00068"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/359842.359859"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2006.7"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/321239.321249"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-60508-7_21"},{"key":"e_1_2_1_15_1","volume-title":"ClamAV - Open Source Antivirus Engine. Website. https:\/\/www.clamav.net\/ Accessed","author":"AV.","year":"2023","unstructured":"Clam AV. 2023. ClamAV - Open Source Antivirus Engine. Website. https:\/\/www.clamav.net\/ Accessed : March 11, 2023 . ClamAV. 2023. ClamAV - Open Source Antivirus Engine. Website. https:\/\/www.clamav.net\/ Accessed: March 11, 2023."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-09510-1_10"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3338906.3342509"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2014.8"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03816-7_32"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1070\/RM1961v016n05ABEH004112"},{"key":"e_1_2_1_21_1","volume-title":"Runaway Regular Expressions: Catastrophic Backtracking. https:\/\/www.regular-expressions.info\/catastrophic.html accessed","author":"Goyvaerts Jan","year":"2023","unstructured":"Jan Goyvaerts . 2021. Runaway Regular Expressions: Catastrophic Backtracking. https:\/\/www.regular-expressions.info\/catastrophic.html accessed March 11, 2023 . Jan Goyvaerts. 2021. Runaway Regular Expressions: Catastrophic Backtracking. https:\/\/www.regular-expressions.info\/catastrophic.html accessed March 11, 2023."},{"key":"e_1_2_1_22_1","volume-title":"GNU Grep - Global Regular Expression Print. https:\/\/www.gnu.org\/software\/grep\/ Accessed","author":"Grep GNU","year":"2023","unstructured":"GNU Grep . 2022. GNU Grep - Global Regular Expression Print. https:\/\/www.gnu.org\/software\/grep\/ Accessed : March 11, 2023 . GNU Grep. 2022. GNU Grep - Global Regular Expression Print. https:\/\/www.gnu.org\/software\/grep\/ Accessed: March 11, 2023."},{"key":"e_1_2_1_23_1","volume-title":"PCRE2 - Perl Compatible Regular Expressions v2. https:\/\/www.pcre.org\/ Accessed","author":"Hazel Philip","year":"2023","unstructured":"Philip Hazel and Zoltan Herczeg . 2022. PCRE2 - Perl Compatible Regular Expressions v2. https:\/\/www.pcre.org\/ Accessed : March 11, 2023 . Philip Hazel and Zoltan Herczeg. 2022. PCRE2 - Perl Compatible Regular Expressions v2. https:\/\/www.pcre.org\/ Accessed: March 11, 2023."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-34175-6_24"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03466-4_15"},{"key":"e_1_2_1_26_1","volume-title":"Posix Syntax in PCRE. https:\/\/www.pcre.org\/original\/doc\/html\/pcrepattern.html Accessed","author":"Posix Syntax","year":"2023","unstructured":"Posix Syntax in PCRE. 2022. Posix Syntax in PCRE. https:\/\/www.pcre.org\/original\/doc\/html\/pcrepattern.html Accessed : March 11, 2023 . Posix Syntax in PCRE. 2022. Posix Syntax in PCRE. https:\/\/www.pcre.org\/original\/doc\/html\/pcrepattern.html Accessed: March 11, 2023."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.312.0249"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1137\/0206024"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3519939.3523456"},{"key":"e_1_2_1_30_1","unstructured":"CsA Automata Library. 2021. CsA Automata Library. https:\/\/pajda.fit.vutbr.cz\/ituronova\/countingautomata \t\t\t\t  CsA Automata Library. 2021. CsA Automata Library. https:\/\/pajda.fit.vutbr.cz\/ituronova\/countingautomata"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-72016-2_18"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-88494-9_8"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2020.3013053"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/SWAT.1971.11"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/SWAT.1972.29"},{"key":"e_1_2_1_36_1","volume-title":"RE2: Google\u2019s regular expression library. Website. https:\/\/github.com\/google\/re2 Accessed","year":"2023","unstructured":"RE2. 2023. RE2: Google\u2019s regular expression library. Website. https:\/\/github.com\/google\/re2 Accessed : March 11, 2023 . RE2. 2023. RE2: Google\u2019s regular expression library. Website. https:\/\/github.com\/google\/re2 Accessed: March 11, 2023."},{"key":"e_1_2_1_37_1","volume-title":"Regular Expression Library. https:\/\/regexlib.com\/ Accessed","year":"2023","unstructured":"RegexLib. 2023. Regular Expression Library. https:\/\/regexlib.com\/ Accessed : March 11, 2023 . RegexLib. 2023. Regular Expression Library. https:\/\/regexlib.com\/ Accessed: March 11, 2023."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.5555\/1039834.1039864"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCBB.2015.2430313"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-17462-0_24"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkp885"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2008.14"},{"key":"e_1_2_1_43_1","volume-title":"Snort - Network Intrusion Detection & Prevention System. https:\/\/www.snort.org\/ Accessed","year":"2023","unstructured":"Snort. 2023. Snort - Network Intrusion Detection & Prevention System. https:\/\/www.snort.org\/ Accessed : March 11, 2023 . Snort. 2023. Snort - Network Intrusion Detection & Prevention System. https:\/\/www.snort.org\/ Accessed: March 11, 2023."},{"key":"e_1_2_1_44_1","volume-title":"https:\/\/spamassassin.apache.org\/ Accessed","author":"SpamAssassin Apache","year":"2023","unstructured":"Apache SpamAssassin . 2022. Apache SpamAssassin . https:\/\/spamassassin.apache.org\/ Accessed : March 11, 2023 . Apache SpamAssassin. 2022. Apache SpamAssassin. https:\/\/spamassassin.apache.org\/ Accessed: March 11, 2023."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/800125.804029"},{"key":"e_1_2_1_46_1","volume-title":"Suricata - Open Source Intrusion Detection and Prevention Engine. https:\/\/suricata.io\/ Accessed","year":"2023","unstructured":"Suricata. 2023. Suricata - Open Source Intrusion Detection and Prevention Engine. https:\/\/suricata.io\/ Accessed : March 11, 2023 . Suricata. 2023. Suricata - Open Source Intrusion Detection and Prevention Engine. https:\/\/suricata.io\/ Accessed: March 11, 2023."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/363347.363387"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3428286"},{"key":"e_1_2_1_49_1","volume-title":"Hyperscan: A Fast Multi-Pattern Regex Matcher for Modern CPUs. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI \u201919)","author":"Wang Xiang","year":"2019","unstructured":"Xiang Wang , Yang Hong , Harry Chang , KyoungSoo Park , Geoff Langdale , Jiayu Hu , and Heqing Zhu . 2019 . Hyperscan: A Fast Multi-Pattern Regex Matcher for Modern CPUs. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI \u201919) . USENIX Association, Boston, MA. 631\u2013648. https:\/\/www.usenix.org\/conference\/nsdi19\/presentation\/wang-xiang Xiang Wang, Yang Hong, Harry Chang, KyoungSoo Park, Geoff Langdale, Jiayu Hu, and Heqing Zhu. 2019. Hyperscan: A Fast Multi-Pattern Regex Matcher for Modern CPUs. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI \u201919). USENIX Association, Boston, MA. 631\u2013648. https:\/\/www.usenix.org\/conference\/nsdi19\/presentation\/wang-xiang"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/1185347.1185360"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3586044","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3586044","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3586044","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:11Z","timestamp":1750178771000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3586044"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,6]]},"references-count":50,"journal-issue":{"issue":"OOPSLA1","published-print":{"date-parts":[[2023,4,6]]}},"alternative-id":["10.1145\/3586044"],"URL":"https:\/\/doi.org\/10.1145\/3586044","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,6]]},"assertion":[{"value":"2023-04-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}