{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T01:49:08Z","timestamp":1773193748965,"version":"3.50.1"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA2","license":[{"start":{"date-parts":[[2024,10,8]],"date-time":"2024-10-08T00:00:00Z","timestamp":1728345600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["2313062"],"award-info":[{"award-number":["2313062"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2024,10,8]]},"abstract":"<jats:p>Multi-pattern matching is widely used in modern software for applications requiring high throughput such as protein search, network traffic inspection, virus or spam detection. Graphics Processor Units (GPUs) excel at executing massively parallel workloads. Regular expression (regex) matching is typically performed by simulating the execution of deterministic finite automata (DFAs) or nondeterministic finite automata (NFAs). The natural implementations of these automata simulation algorithms on GPUs are highly inefficient because they give rise to irregular memory access patterns.<\/jats:p>\n                  <jats:p>This paper presents HybridSA, a heterogeneous CPU-GPU parallel engine for multi-pattern matching. HybridSA uses bit parallelism to efficiently simulate NFAs on GPUs, thus reducing the number of memory accesses and increasing the throughput. Our bit-parallel algorithms extend the classical shift-and algorithm for string matching to a large class of regular expressions and reduce automata simulation to a small number of bitwise operations. We have developed a compiler to translate regular expressions into bit masks, perform optimizations, and choose the best algorithms to run on the GPU. The majority of the regular expressions are accelerated on the GPU, while the patterns that exhibit random memory accesses are executed on the CPU in parallel. We evaluate HybridSA against state-of-the-art CPU and GPU engines, as well as a hybrid combination of the two. HybridSA achieves between 4 and 60 times higher throughput than the state-of-the-art CPU engine and between 4 and 233 times better than the state-of-the-art GPU engine across a collection of real-world benchmarks.<\/jats:p>","DOI":"10.1145\/3689771","type":"journal-article","created":{"date-parts":[[2024,10,8]],"date-time":"2024-10-08T03:23:04Z","timestamp":1728357784000},"page":"1699-1728","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["HybridSA: GPU Acceleration of Multi-pattern Regex Matching using Bit Parallelism"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5444-5924","authenticated-orcid":false,"given":"Alexis","family":"Le Glaunec","sequence":"first","affiliation":[{"name":"Rice University, Houston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0672-2998","authenticated-orcid":false,"given":"Lingkun","family":"Kong","sequence":"additional","affiliation":[{"name":"Rice University, Houston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1209-7738","authenticated-orcid":false,"given":"Konstantinos","family":"Mamouras","sequence":"additional","affiliation":[{"name":"Rice University, Houston, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,10,8]]},"reference":[{"key":"e_1_3_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/360825.360855"},{"key":"e_1_3_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2015.2429918"},{"key":"e_1_3_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/135239.135243"},{"key":"e_1_3_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-75632-5_5"},{"issue":"5","key":"e_1_3_1_6_1","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1145\/1880153.1880157","article-title":"iNFAnt: NFA Pattern Matching on GPGP U Devices","volume":"40","author":"Cascarano Niccolo\u2019","year":"2010","unstructured":"Niccolo\u2019 Cascarano, Pierluigi Rolando, Fulvio Risso, and Riccardo Sisto. 2010. iNFAnt: NFA Pattern Matching on GPGP U Devices. ACM SIGCOMM Computer Communication Review 40, 5 (2010), 20-26. https:\/\/doi.org\/10.1145\/1880153.1880157 10.1145\/1880153.1880157","journal-title":"ACM SIGCOMM Computer Communication Review"},{"key":"e_1_3_1_7_1","unstructured":"Genome Reference Consortium. 2022. Genome Reference Consortium - Human Genome Overview. https:\/\/www.ncbi.nlm.nih.gov\/grc\/human Accessed: March 1 2024."},{"key":"e_1_3_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2014.8"},{"key":"e_1_3_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3617232.3624848"},{"key":"e_1_3_1_10_1","doi-asserted-by":"publisher","DOI":"10.1070\/RM1961v016n05ABEH004112"},{"key":"e_1_3_1_11_1","unstructured":"GNU Grep. 2022. GNU Grep - Global Regular Expression Print. https:\/\/www.gnu.org\/software\/grep\/ Accessed: March 11 2023."},{"key":"e_1_3_1_12_1","first-page":"25","volume-title":"In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)","author":"Huang Yi","year":"2022","unstructured":"Yi Huang, Zhiyu Chen, Dai Li, and Kaiyuan Yang. 2022. CAMA: Energy and Memory Efficient Automata Processing in Content-Addressable Memories. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, USA, 25-37. https:\/\/doi.org\/10.1109\/HPCA53966.2022.00011 10.1109\/HPCA53966.2022.00011"},{"key":"e_1_3_1_13_1","first-page":"317","volume-title":"Proceedings of the 2012 ACM Conference on Computer and Communications Security (Raleigh, North Carolina, USA) (CCS \u201912)","author":"Jamshed Muhammad Asim","year":"2012","unstructured":"Muhammad Asim Jamshed, Jihyung Lee, Sangwoo Moon, Insu Yun, Deokjin Kim, Sungryoul Lee, Yung Yi, and KyoungSoo Park. 2012. Kargus: a highly-scalable software-based intrusion detection system. In Proceedings of the 2012 ACM Conference on Computer and Communications Security (Raleigh, North Carolina, USA) (CCS \u201912). Association for Computing Machinery, New York, NY, USA, 317-328. https:\/\/doi.org\/10.1145\/2382196.2382232 10.1145\/2382196.2382232"},{"key":"e_1_3_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/364175.364185"},{"key":"e_1_3_1_15_1","first-page":"3","volume-title":"Number 34 in Annals of Mathematics Studies","author":"Kleene Stephen Cole","year":"1956","unstructured":"Stephen Cole Kleene . 1956. Representation of Events in Nerve Nets and Finite Automata. In Automata Studies, Claude E. Shannon and John McCarthy (Eds.). Number 34 in Annals of Mathematics Studies. Princeton University Press, 3-41."},{"key":"e_1_3_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3519939.3523456"},{"key":"e_1_3_1_17_1","first-page":"30","article-title":"Regular Expression Matching Using Bit Vector Automata","volume":"7","author":"Le Glaunec Alexis","year":"2023","unstructured":"Alexis Le Glaunec, Lingkun Kong, and Konstantinos Mamouras. 2023. Regular Expression Matching Using Bit Vector Automata. Proceedings of the ACM on Programming Languages 7, OOPSLA1, Article 92 (2023), 30 pages. https:\/\/doi.org\/10.1145\/3586044 10.1145\/3586044","journal-title":"Proceedings of the ACM on Programming Languages"},{"key":"e_1_3_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378471"},{"issue":"1","key":"e_1_3_1_19_1","first-page":"27","article-title":"Asynchronous Automata Processing on GPUs","volume":"7","author":"Liu Hongyuan","year":"2023","unstructured":"Hongyuan Liu, Sreepathi Pai, and Adwait Jog. 2023. Asynchronous Automata Processing on GPUs. Proceedings of the ACM on Measurement and Analysis of Computing Systems 7, 1, Article 27 (2023), 27 pages. https:\/\/doi.org\/10.1145\/3579453 10.1145\/3579453","journal-title":"Proceedings of the ACM on Measurement and Analysis of Computing Systems"},{"key":"e_1_3_1_20_1","first-page":"31","article-title":"Efficient Matching of Regular Expressions with Lookaround Assertions","volume":"8","author":"Mamouras Konstantinos","year":"2024","unstructured":"Konstantinos Mamouras and Agnishom Chattopadhyay. 2024. Efficient Matching of Regular Expressions with Lookaround Assertions. Proceedings of the ACM on Programming Languages 8, POPL, Article 92 (2024), 31 pages. https:\/\/doi.org\/10.1145\/3632934 10.1145\/3632934","journal-title":"Proceedings of the ACM on Programming Languages"},{"key":"e_1_3_1_21_1","first-page":"25","article-title":"Static Analysis for Checking the Disambiguation Robustness of Regular Expressions","volume":"8","author":"Mamouras Konstantinos","year":"2024","unstructured":"Konstantinos Mamouras, Alexis Le Glaunec, Wu Angela Li, and Agnishom Chattopadhyay. 2024. Static Analysis for Checking the Disambiguation Robustness of Regular Expressions. Proceedings of the ACM on Programming Languages 8, PLDI, Article 231 (2024), 25 pages. https:\/\/doi.org\/10.1145\/3656461 10.1145\/3656461","journal-title":"Proceedings of the ACM on Programming Languages"},{"key":"e_1_3_1_22_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.411"},{"key":"e_1_3_1_23_1","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781316135228"},{"key":"e_1_3_1_24_1","unstructured":"NVidia. 2013a. How to Access Global Memory Efficiently in CUDA C\/C++ Kernels. https:\/\/developer.nvidia.com\/blog\/how-access-global-memory-efficiently-cuda-c-kernels\/"},{"key":"e_1_3_1_25_1","unstructured":"NVidia. 2013b. Using Shared Memory in CUDA C\/C++. https:\/\/developer.nvidia.com\/blog\/using-shared-memory-cuda-cc\/"},{"key":"e_1_3_1_26_1","unstructured":"NVidia. 2024. CUDA C++ Programming Guide: Programming Model. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html#programming-model"},{"key":"e_1_3_1_27_1","first-page":"138","volume-title":"In 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","author":"Rahimi Reza","year":"2020","unstructured":"Reza Rahimi, Elaheh Sadredini, Mircea Stan, and Kevin Skadron. 2020. Grapefruit: An Open-Source, Full-Stack, and Customizable Automata Processing on FPGAs. In 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, USA, 138-147. https:\/\/doi.org\/10.1109\/FCCM48280.2020.00027 10.1109\/FCCM48280.2020.00027"},{"key":"e_1_3_1_28_1","unstructured":"RE2. 2023. RE2: Google\u2019s regular expression library. Website. https:\/\/github.com\/google\/re2 Accessed: March 11 2023."},{"key":"e_1_3_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/1039834.1039864"},{"key":"e_1_3_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCBB.2015.2430313"},{"key":"e_1_3_1_31_1","first-page":"372","volume-title":"Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2019) (LNCS, Vol. 11427)","author":"Saarikivi Olli","year":"2019","unstructured":"Olli Saarikivi, Margus Veanes, Tiki Wan, and Eric Xu. 2019. Symbolic Regex Matcher. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2019) (LNCS, Vol. 11427), Tom\u00e1s Vojnar and Lijun Zhang (Eds.). Springer, Cham, 372-378. https:\/\/doi.org\/10.1007\/978-3-030-17462-0_24 10.1007\/978-3-030-17462-0_24"},{"key":"e_1_3_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/1058426.1058885"},{"issue":"1","key":"e_1_3_1_33_1","first-page":"D161","article-title":"PROSITE, A Protein Domain Database for Functional Characterization and Annotation","volume":"38","author":"Sigrist Christian J. A.","year":"2009","unstructured":"Christian J. A. Sigrist, Lorenzo Cerutti, Edouard de Castro, Petra S. Langendijk-Genevaux, Virginie Bulliard, Amos Bairoch, and Nicolas Hulo. 2009. PROSITE, A Protein Domain Database for Functional Characterization and Annotation. Nucleic Acids Research 38, suppl_1 (2009), D161-D166. https:\/\/doi.org\/10.1093\/nar\/gkp885 10.1093\/nar\/gkp885","journal-title":"Nucleic Acids Research"},{"key":"e_1_3_1_34_1","first-page":"7","volume-title":"Proceedings of the 12th International Workshop on Data Management on New Hardware (DaMoN \u201916)","author":"Sitaridi Evangelia","year":"2016","unstructured":"Evangelia Sitaridi, Orestis Polychroniou, and Kenneth A. Ross. 2016. SIMD-Accelerated Regular Expression Matching. In Proceedings of the 12th International Workshop on Data Management on New Hardware (DaMoN \u201916). ACM, New York, NY, USA, Article 8, 7 pages. https:\/\/doi.org\/10.1145\/2933349.2933357 10.1145\/2933349.2933357"},{"key":"e_1_3_1_35_1","unstructured":"Snort. 2024. Snort - Network Intrusion Detection & Prevention System. https:\/\/www.snort.org\/ Accessed: March 11 2024."},{"key":"e_1_3_1_36_1","unstructured":"Apache SpamAssassin. 2024. Apache SpamAssassin. https:\/\/spamassassin.apache.org\/ Accessed: March 11 2024."},{"key":"e_1_3_1_37_1","unstructured":"Suricata. 2024. Suricata - Open Source Intrusion Detection and Prevention Engine. https:\/\/suricata.io\/ Accessed: March 11 2024."},{"key":"e_1_3_1_38_1","unstructured":"The PCRE2 Developers. 2024. Perl-compatible Regular Expressions (revised API: PCRE2). https:\/\/pcre2project.github.io\/pcre2\/doc\/html\/index.html."},{"key":"e_1_3_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/363347.363387"},{"key":"e_1_3_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2011.6114181"},{"key":"e_1_3_1_41_1","first-page":"13","volume-title":"In 2018 IEEE International Symposium on Workload Characterization (IISWC)","author":"Wadden Jack","year":"2018","unstructured":"Jack Wadden, Tommy Tracy, Elaheh Sadredini, Lingxi Wu, Chunkun Bo, Jesse Du, Yizhou Wei, Jeffrey Udall, Matthew Wallace, Mircea Stan, and Kevin Skadron. 2018. AutomataZoo: A Modern Automata Processing Benchmark Suite. In 2018 IEEE International Symposium on Workload Characterization (IISWC). IEEE, New York, NY, USA, 13-24. https:\/\/doi.org\/10.1109\/IISWC.2018.8573482 10.1109\/IISWC.2018.8573482"},{"key":"e_1_3_1_42_1","first-page":"3","volume-title":"Proceedings of the Eleventh IEEE\/ACM\/IFIP International Conference on Hardware\/Software Codesign and System Synthesis (CODES \u201916)","author":"Wang Ke","year":"2016","unstructured":"Ke Wang, Kevin Angstadt, Chunkun Bo, Nathan Brunelle, Elaheh Sadredini, Tommy Tracy, Jack Wadden, Mircea Stan, and Kevin Skadron. 2016. An Overview of Micron\u2019s Automata Processor. In Proceedings of the Eleventh IEEE\/ACM\/IFIP International Conference on Hardware\/Software Codesign and System Synthesis (CODES \u201916). ACM, New York, NY, USA, Article 14, 3 pages. https:\/\/doi.org\/10.1145\/2968456.2976763 10.1145\/2968456.2976763"},{"key":"e_1_3_1_43_1","first-page":"366","volume-title":"In Proceedings of the 2011 Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS \u201911)","author":"Wang Lei","year":"2011","unstructured":"Lei Wang, Shuhui Chen, Yong Tang, and Jinshu Su. 2011. Gregex: GP U Based High Speed Regular Expression Matching Engine. In Proceedings of the 2011 Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS \u201911). IEEE Computer Society, USA, 366-370. https:\/\/doi.org\/10.1109\/IMIS.2011.107 10.1109\/IMIS.2011.107"},{"key":"e_1_3_1_44_1","first-page":"631","volume-title":"In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI \u201919)","author":"Wang Xiang","year":"2019","unstructured":"Xiang Wang, Yang Hong, Harry Chang, KyoungSoo Park, Geoff Langdale, Jiayu Hu, and Heqing Zhu. 2019. Hyperscan: A Fast Multi-Pattern Regex Matcher for Modern CP Us. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI \u201919). USENIX Association, Boston, MA, 631-648. https:\/\/www.usenix.org\/conference\/nsdi19\/presentation\/wang-xiang"},{"key":"e_1_3_1_45_1","first-page":"481","volume-title":"In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","author":"Wang Yuguang","year":"2022","unstructured":"Yuguang Wang, Robbie Watling, Junqiao Qiu, and Zhenlin Wang. 2022. GSpecPal: Speculation-Centric Finite State Machine Parallelization on GPUs. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, USA, 481-491. https:\/\/doi.org\/10.1109\/IPDPS53621.2022.00053 10.1109\/IPDPS53621.2022.00053"},{"key":"e_1_3_1_46_1","first-page":"151","volume-title":"In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS \u201924)","author":"Wen Ziyuan","year":"2024","unstructured":"Ziyuan Wen, Lingkun Kong, Alexis Le Glaunec, Konstantinos Mamouras, and Kaiyuan Yang. 2024. BVAP: Energy and Memory Efficient Automata Processing for Regular Expressions. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS \u201924). ACM, New York, NY, USA, 151-166. https:\/\/doi.org\/10.1145\/3620665.3640412 10.1145\/3620665.3640412"},{"key":"e_1_3_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/135239.135244"},{"key":"e_1_3_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2011.129"},{"key":"e_1_3_1_49_1","unstructured":"Yara. 2024. ClamAV - The pattern matching swiss knife for malware researchers. Website. https:\/\/virustotal.github.io\/yara\/ Accessed: August 10 2024."},{"key":"e_1_3_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/1185347.1185360"},{"key":"e_1_3_1_51_1","first-page":"10","volume-title":"Proceedings of the ACM International Conference on Computing Frontiers (CF \u201913)","author":"Yu Xiaodong","year":"2013","unstructured":"Xiaodong Yu and Michela Becchi. 2013. GP U Acceleration of Regular Expression Matching for Large Datasets: Exploring the Implementation Space. In Proceedings of the ACM International Conference on Computing Frontiers (CF \u201913). ACM, New York, NY, USA, Article 18, 10 pages. https:\/\/doi.org\/10.1145\/2482767.2482791 10.1145\/2482767.2482791"},{"key":"e_1_3_1_52_1","first-page":"129","volume-title":"Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (New Orleans, Louisiana, USA) (PPoPP \u201912)","author":"Yuan Zu","year":"2012","unstructured":"Yuan Zu, Ming Yang, Zhonghu Xu, Lin Wang, Xin Tian, Kunyang Peng, and Qunfeng Dong. 2012. GPU-based NFA Implementation for Memory Efficient High Speed Regular Expression Matching. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (New Orleans, Louisiana, USA) (PPoPP \u201912). ACM, New York, NY, USA, 129-140. https:\/\/doi.org\/10.1145\/2145816.2145833 10.1145\/2145816.2145833"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3689771","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3689771","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T09:05:28Z","timestamp":1770195928000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3689771"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,8]]},"references-count":51,"journal-issue":{"issue":"OOPSLA2","published-print":{"date-parts":[[2024,10,8]]}},"alternative-id":["10.1145\/3689771"],"URL":"https:\/\/doi.org\/10.1145\/3689771","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,8]]},"assertion":[{"value":"2024-04-06","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-18","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}