{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T14:29:09Z","timestamp":1754144949636,"version":"3.41.2"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"PLDI","license":[{"start":{"date-parts":[[2025,6,13]],"date-time":"2025-06-13T00:00:00Z","timestamp":1749772800000},"content-version":"vor","delay-in-days":3,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF 2313062, CCF 2146518, CCF 2107261"],"award-info":[{"award-number":["CCF 2313062, CCF 2146518, CCF 2107261"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2025,6,10]]},"abstract":"<jats:p>This paper is about semantic regular expressions (SemREs). This is a concept that was recently proposed by Smore (Chen et al. 2023) in which classical regular expressions are extended with a primitive to query external oracles such as databases and large language models (LLMs). SemREs can be used to identify lines of text containing references to semantic concepts such as cities, celebrities, political entities, etc. The focus in their paper was on automatically synthesizing semantic regular expressions from positive and negative examples.<\/jats:p>\n          <jats:p>\n            In this paper, we study the\n            <jats:italic toggle=\"yes\">membership testing problem<\/jats:italic>\n            .\n          <\/jats:p>\n          <jats:p>\n            First, we present a two-pass NFA-based algorithm to determine whether a string\n            <jats:italic toggle=\"yes\">w<\/jats:italic>\n            matches a SemRE\n            <jats:italic toggle=\"yes\">r<\/jats:italic>\n            in\n            <jats:italic toggle=\"yes\">O<\/jats:italic>\n            (|\n            <jats:italic toggle=\"yes\">r<\/jats:italic>\n            |\n            <jats:sup>2<\/jats:sup>\n            |\n            <jats:italic toggle=\"yes\">w<\/jats:italic>\n            |\n            <jats:sup>2<\/jats:sup>\n            + |\n            <jats:italic toggle=\"yes\">r<\/jats:italic>\n            | |\n            <jats:italic toggle=\"yes\">w<\/jats:italic>\n            |\n            <jats:sup>3<\/jats:sup>\n            ) time, assuming the oracle responds to each query in unit time. In common situations, where oracle queries are not nested, we show that this procedure runs in\n            <jats:italic toggle=\"yes\">O<\/jats:italic>\n            (|\n            <jats:italic toggle=\"yes\">r<\/jats:italic>\n            |\n            <jats:sup>2<\/jats:sup>\n            |\n            <jats:italic toggle=\"yes\">w<\/jats:italic>\n            |\n            <jats:sup>2<\/jats:sup>\n            ) time. Experiments with a prototype implementation of this algorithm validate our theoretical analysis, and show that the procedure massively outperforms a dynamic programming-based baseline, and incurs a \u2248 2 \u00d7 overhead over the time needed for interaction with the oracle.\n          <\/jats:p>\n          <jats:p>\n            Second, we establish connections between SemRE membership testing and the triangle finding problem from graph theory, which suggest that developing algorithms which are simultaneously practical and asymptotically faster might be challenging. Furthermore, algorithms for classical regular expressions primarily aim to optimize their time and memory consumption. In contrast, an important consideration in our setting is to minimize the cost of invoking the oracle. We demonstrate an \u03a9(|\n            <jats:italic toggle=\"yes\">w<\/jats:italic>\n            |\n            <jats:sup>2<\/jats:sup>\n            ) lower bound on the number of oracle queries necessary to make this determination.\n          <\/jats:p>","DOI":"10.1145\/3729300","type":"journal-article","created":{"date-parts":[[2025,6,13]],"date-time":"2025-06-13T16:02:27Z","timestamp":1749830547000},"page":"1245-1268","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Membership Testing for Semantic Regular Expressions"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-5675-4065","authenticated-orcid":false,"given":"Yifei","family":"Huang","sequence":"first","affiliation":[{"name":"University of Southern California, Los Angeles, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-8776-3925","authenticated-orcid":false,"given":"Matin","family":"Amini","sequence":"additional","affiliation":[{"name":"University of Southern California, Los Angeles, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5444-5924","authenticated-orcid":false,"given":"Alexis","family":"Le Glaunec","sequence":"additional","affiliation":[{"name":"Rice University, Houston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1209-7738","authenticated-orcid":false,"given":"Konstantinos","family":"Mamouras","sequence":"additional","affiliation":[{"name":"Rice University, Houston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2879-0932","authenticated-orcid":false,"given":"Mukund","family":"Raghothaman","sequence":"additional","affiliation":[{"name":"University of Southern California, Los Angeles, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,6,13]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007352.1007390"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/0890-5401(87)90052-6"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(95)00182-4"},{"key":"e_1_2_2_4_1","first-page":"1209","article-title":"On Economical Construction of the Transitive Closure of an Oriented Graph","volume":"11","author":"Arlazarov Vladimir","year":"1970","unstructured":"Vladimir Arlazarov, Yefim A. Dinitz, M. A. Kronrod, and Igor Faradzhev. 1970. On Economical Construction of the Transitive Closure of an Oriented Graph. Soviet Mathematics Doklady, 11 (1970), 1209\u20131210. issn:0197\u20136788","journal-title":"Soviet Mathematics Doklady"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3591300"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/T-C.1971.223204"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/321239.321249"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipl.2021.106135"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3622863"},{"key":"e_1_2_2_10_1","volume-title":"Binding Language Models in Symbolic Languages. In The Eleventh International Conference on Learning Representations. OpenReview.net. https:\/\/openreview.net\/forum?id=lH1PV42cbF","author":"Cheng Zhoujun","year":"2023","unstructured":"Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah Smith, and Tao Yu. 2023. Binding Language Models in Symbolic Languages. In The Eleventh International Conference on Learning Representations. OpenReview.net. https:\/\/openreview.net\/forum?id=lH1PV42cbF"},{"key":"e_1_2_2_11_1","unstructured":"Russ Cox. 2010. Implementing Regular Expressions."},{"key":"e_1_2_2_12_1","unstructured":"Nissar Chababy et al.. 2023. PyFunceble. https:\/\/github.com\/funilrys\/PyFunceble"},{"key":"e_1_2_2_13_1","unstructured":"Philip Hazel et al.. 1997. PCRE - Perl Compatible Regular Expressions. https:\/\/www.pcre.org\/"},{"key":"e_1_2_2_14_1","unstructured":"Russ Cox et al.. 2010. RE2: A Principled Approach to Regular Expression Matching. https:\/\/opensource.googleblog.com\/2010\/03\/re2-principled-approach-to-regular.html"},{"volume-title":"Introduction to Automata Theory, Languages, and Computation","author":"Hopcroft John","key":"e_1_2_2_15_1","unstructured":"John Hopcroft and Jeffrey Ullman. 1979. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley."},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.15048870"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/256167.256195"},{"key":"e_1_2_2_18_1","first-page":"457","article-title":"Validating Large Language Models with ReLM","volume":"5","author":"Kuchnik Michael","year":"2023","unstructured":"Michael Kuchnik, Virginia Smith, and George Amvrosiadis. 2023. Validating Large Language Models with ReLM. Proceedings of Machine Learning and Systems, 5 (2023), 457\u2013476.","journal-title":"Proceedings of Machine Learning and Systems"},{"volume-title":"Mathematical Foundations of Computer Science","author":"Kupferman Orna","key":"e_1_2_2_19_1","unstructured":"Orna Kupferman and Sharon Zuhovitzky. 2002. An Improved Algorithm for the Membership Problem for Extended Regular Expressions. In Mathematical Foundations of Computer Science. Springer, 446\u2013458. isbn:978-3-540-45687-2"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3586044"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/505241.505242"},{"key":"e_1_2_2_22_1","unstructured":"Lee McMahon. 1974. sed."},{"key":"e_1_2_2_23_1","unstructured":"Meta. 2024. LlaMa3. https:\/\/github.com\/meta-llama\/llama3\/blob\/main\/MODEL_CARD.md"},{"key":"e_1_2_2_24_1","unstructured":"Erik Nijkamp Bo Pang Hiroaki Hayashi Lifu Tu Huan Wang Yingbo Zhou Silvio Savarese and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations. OpenReview.net. https:\/\/openreview.net\/forum?id=iaYcJKpY2B_"},{"key":"e_1_2_2_25_1","unstructured":"OpenAI. 2024. Pricing. https:\/\/openai.com\/api\/pricing\/"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3583660.3583664"},{"volume-title":"The Membership Problem for Regular Expressions with Intersection Is Complete in LOGCFL","author":"Petersen Holger","key":"e_1_2_2_27_1","unstructured":"Holger Petersen. 2002. The Membership Problem for Regular Expressions with Intersection Is Complete in LOGCFL. In STACS. Springer, 513\u2013522. isbn:978-3-540-45841-8"},{"key":"e_1_2_2_28_1","unstructured":"QuantFactory. 2024. LlaMa3. https:\/\/huggingface.co\/QuantFactory\/Meta-Llama-3-8B-GGUF"},{"key":"e_1_2_2_29_1","unstructured":"Zach Rice. 2022. Gitleaks. https:\/\/gitleaks.io\/"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-71389-0_24"},{"volume-title":"Tools and Algorithms for the Construction and Analysis of Systems (TACAS)","author":"Saarikivi Olli","key":"e_1_2_2_31_1","unstructured":"Olli Saarikivi, Margus Veanes, Tiki Wan, and Eric Xu. 2019. Symbolic Regex Matcher. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS). Springer, 372\u2013378. isbn:978-3-030-17462-0"},{"key":"e_1_2_2_32_1","volume-title":"Introduction to the Theory of Computation","author":"Sipser Michael","year":"1877","unstructured":"Michael Sipser. 2012. Introduction to the Theory of Computation (3rd ed.). Cengage Learning. isbn:978-1133187790","edition":"3"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/363347.363387"},{"key":"e_1_2_2_34_1","unstructured":"Ken Thompson. 1973. grep."},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0022-0000(75)80046-8"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3704837"},{"key":"e_1_2_2_37_1","unstructured":"Marcell Vazquez-Chanlatte Karim Elmaaroufi Stefan Witwicki and Sanjit Seshia. 2024. L^*LM: Learning Automata from Examples using Natural Language Oracles. arxiv:2402.07051."},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2103656.2103674"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485477"},{"key":"e_1_2_2_40_1","volume-title":"Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI). USENIX Association, 631\u2013648","author":"Wang Xiang","year":"2019","unstructured":"Xiang Wang, Yang Hong, Harry Chang, KyoungSoo Park, Geoff Langdale, Jiayu Hu, and Heqing Zhu. 2019. Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI). USENIX Association, 631\u2013648. isbn:978-1-931971-49-2 https:\/\/www.usenix.org\/conference\/nsdi19\/presentation\/wang-xiang"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2010.67"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ic.2018.02.006"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729300","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729300","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,16]],"date-time":"2025-07-16T06:04:08Z","timestamp":1752645848000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729300"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,10]]},"references-count":42,"journal-issue":{"issue":"PLDI","published-print":{"date-parts":[[2025,6,10]]}},"alternative-id":["10.1145\/3729300"],"URL":"https:\/\/doi.org\/10.1145\/3729300","relation":{},"ISSN":["2475-1421"],"issn-type":[{"type":"electronic","value":"2475-1421"}],"subject":[],"published":{"date-parts":[[2025,6,10]]},"assertion":[{"value":"2024-11-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-06","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}