{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T02:25:31Z","timestamp":1773887131902,"version":"3.50.1"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"PLDI","license":[{"start":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T00:00:00Z","timestamp":1718841600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["2313062"],"award-info":[{"award-number":["2313062"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2024,6,20]]},"abstract":"<jats:p>Regular expressions are commonly used for finding and extracting matches from sequence data. Due to the inherent ambiguity of regular expressions, a disambiguation policy must be considered for the match extraction problem, in order to uniquely determine the desired match out of the possibly many matches. The most common disambiguation policies are the POSIX policy and the greedy (PCRE) policy. The POSIX policy chooses the longest match out of the leftmost ones. The greedy policy chooses a leftmost match and further disambiguates using a greedy interpretation of Kleene iteration to match as many times as possible. The choice of disambiguation policy can affect the output of match extraction, which can be an issue for reusing regular expressions across regex engines. In this paper, we introduce and study the notion of disambiguation robustness for regular expressions. A regular expression is robust if its extraction semantics is indifferent to whether the POSIX or greedy disambiguation policy is chosen. This gives rise to a decision problem for regular expressions, which we prove to be PSPACE-complete. We propose a static analysis algorithm for checking the (non-)robustness of regular expressions and two performance optimizations. We have implemented the proposed algorithms and we have shown experimentally that they are practical for analyzing large datasets of regular expressions derived from various application domains.<\/jats:p>","DOI":"10.1145\/3656461","type":"journal-article","created":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T16:27:20Z","timestamp":1718900840000},"page":"2073-2097","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Static Analysis for Checking the Disambiguation Robustness of Regular Expressions"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1209-7738","authenticated-orcid":false,"given":"Konstantinos","family":"Mamouras","sequence":"first","affiliation":[{"name":"Rice University, Houston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5444-5924","authenticated-orcid":false,"given":"Alexis","family":"Le Glaunec","sequence":"additional","affiliation":[{"name":"Rice University, Houston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4523-3401","authenticated-orcid":false,"given":"Wu Angela","family":"Li","sequence":"additional","affiliation":[{"name":"Rice University, Houston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-0462-8080","authenticated-orcid":false,"given":"Agnishom","family":"Chattopadhyay","sequence":"additional","affiliation":[{"name":"Rice University, Houston, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,6,20]]},"reference":[{"key":"e_1_3_1_2_1","unstructured":"awk 2024. Gawk: Effective AWK Programming. https:\/\/www.gnu.org\/software\/gawk\/. [Online; accessed March 24 2024]."},{"key":"e_1_3_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-75632-5_5"},{"key":"e_1_3_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2021.01.010"},{"key":"e_1_3_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2016.09.006"},{"key":"e_1_3_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-60134-2_2"},{"key":"e_1_3_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/T-C.1971.223204"},{"key":"e_1_3_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00236-020-00366-7"},{"key":"e_1_3_1_9_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.2881"},{"key":"e_1_3_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1836089.1836120"},{"key":"e_1_3_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/256167.256174"},{"key":"e_1_3_1_12_1","unstructured":"Russ Cox. 2010. Regular Expression Matching in the Wild. https:\/\/swtch.com\/rsc\/regexp\/regexp3.html. [Online; accessed March 24 2024]."},{"key":"e_1_3_1_13_1","first-page":"29","volume-title":"USENIX Security Symposium","author":"Crosby Scott A.","year":"2003","unstructured":"Scott A. Crosby and Dan S. Wallach. 2003. Denial of Service via Algorithmic Complexity Attacks. In USENIX Security Symposium. USENIX Association, USA, 29\u201344. https:\/\/www.usenix.org\/legacy\/event\/sec03\/tech\/full_papers\/crosby\/crosby.pdf"},{"key":"e_1_3_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3236024.3236027"},{"key":"e_1_3_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3338906.3338909"},{"key":"e_1_3_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/s002360000037"},{"key":"e_1_3_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00224-012-9389-0"},{"key":"e_1_3_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-27836-8_53"},{"key":"e_1_3_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-39274-0_7"},{"key":"e_1_3_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10882-7_14"},{"key":"e_1_3_1_21_1","unstructured":"grep 2024. GNU Grep (Global Regular Expression Print). https:\/\/www.gnu.org\/software\/grep\/. [Online; accessed March 24 2024]."},{"key":"e_1_3_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/WSE.2010.5623572"},{"key":"e_1_3_1_23_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.4380210803"},{"key":"e_1_3_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3519939.3523456"},{"key":"e_1_3_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/SPIRE.2000.878194"},{"key":"e_1_3_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3586044"},{"key":"e_1_3_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3632934"},{"key":"e_1_3_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2019.00047"},{"key":"e_1_3_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-21254-3_32"},{"key":"e_1_3_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-18098-9_25"},{"key":"e_1_3_1_31_1","unstructured":"PCRE. 2024. pcre2pattern man page. https:\/\/www.pcre.org\/current\/doc\/html\/pcre2pattern.html. [Online; accessed March 24 2024]."},{"key":"e_1_3_1_32_1","unstructured":"RE2. 2024. RE2: Google\u2019s regular expression library. Website. https:\/\/github.com\/google\/re2 [Online; accessed March 24 2024]."},{"key":"e_1_3_1_33_1","unstructured":"RegexLib. 2024. Regular Expression Library. https:\/\/regexlib.com\/ [Online; accessed March 24 2024]."},{"key":"e_1_3_1_34_1","doi-asserted-by":"publisher","DOI":"10.5555\/1039834.1039864"},{"key":"e_1_3_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCBB.2015.2430313"},{"key":"e_1_3_1_36_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jal.2011.11.003"},{"key":"e_1_3_1_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0022-0000(70)80006-X"},{"key":"e_1_3_1_38_1","unstructured":"sed 2024. GNU sed (stream editor): non-interactive command-line text editor. https:\/\/www.gnu.org\/software\/sed\/. [Online; accessed March 24 2024].."},{"key":"e_1_3_1_39_1","unstructured":"Snort. 2024. Snort - Network Intrusion Detection & Prevention System. https:\/\/www.snort.org\/ [Online; accessed March 24 2024]."},{"key":"e_1_3_1_40_1","unstructured":"Apache SpamAssassin. 2024. Apache SpamAssassin. https:\/\/spamassassin.apache.org\/ [Online; accessed March 24 2024]"},{"key":"e_1_3_1_41_1","first-page":"361","volume-title":"27th USENIX Security Symposium (USENIX Security 2018)","author":"Staicu Cristian-Alexandru","year":"2018","unstructured":"Cristian-Alexandru Staicu and Michael Pradel. 2018. Freezing the Web: A Study of ReDoS Vulnerabilities in JavaScriptbased Web Servers. In 27th USENIX Security Symposium (USENIX Security 2018). USENIX Association, USA, 361\u2013376. https:\/\/www.usenix.org\/conference\/usenixsecurity18\/presentation\/staicu"},{"key":"e_1_3_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370776.2370788"},{"key":"e_1_3_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-07151-0_13"},{"key":"e_1_3_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-40946-7_22"},{"key":"e_1_3_1_45_1","unstructured":"Suricata. 2024. Suricata - Open Source Intrusion Detection and Prevention Engine. https:\/\/suricata.io\/ [Online; accessed March 24 2024]."},{"key":"e_1_3_1_46_1","doi-asserted-by":"publisher","DOI":"10.4230\/LIPIcs.ITP.2023.27"},{"key":"e_1_3_1_47_1","doi-asserted-by":"publisher","unstructured":"Ulya Trofimovich. 2020. RE2C: A Lexer Generator Based on Lookahead-TDFA. Software Impacts 6 (2020) 100027. https:\/\/doi.org\/10.1016\/j.simpa.2020.100027 10.1016\/j.simpa.2020.100027","DOI":"10.1016\/j.simpa.2020.100027"},{"key":"e_1_3_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10817-023-09667-1"},{"key":"e_1_3_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/SANER.2019.8667972"},{"key":"e_1_3_1_50_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(91)90381-B"},{"key":"e_1_3_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3620665.3640412"},{"key":"e_1_3_1_52_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipl.2018.12.001"},{"key":"e_1_3_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/1185347.1185360"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3656461","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3656461","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,4]],"date-time":"2025-07-04T20:43:43Z","timestamp":1751661823000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3656461"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,20]]},"references-count":52,"journal-issue":{"issue":"PLDI","published-print":{"date-parts":[[2024,6,20]]}},"alternative-id":["10.1145\/3656461"],"URL":"https:\/\/doi.org\/10.1145\/3656461","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,20]]},"assertion":[{"value":"2024-06-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}