{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T14:56:58Z","timestamp":1776783418571,"version":"3.51.2"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,6,22]],"date-time":"2023-06-22T00:00:00Z","timestamp":1687392000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Science Foundation","award":["1421059"],"award-info":[{"award-number":["1421059"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2023,9,30]]},"abstract":"<jats:p>Deterministic and Non-deterministic Finite Automata (DFA and NFA) comprise the core of many big data applications. Recent efforts to develop Domain-Specific Architectures (DSAs) for DFA\/NFA have taken divergent approaches, but achieving consistent throughput for arbitrarily-large pattern sets, state activation rates, and pattern match rates remains a challenge. In this article, we present NAPOLY (Non-Deterministic Automata Processor OverLaY), an FPGA overlay and associated compiler. A common limitation of prior efforts is a limit on NFA size for achieving the advertised throughput. NAPOLY is optimized for fast re-programming to permit practical time-division multiplexing of the hardware and permit high asymptotic throughput for NFAs of unlimited size, unlimited state activation rate, and high pattern reporting rate. NAPOLY also allows for offline generation of configurations having tradeoffs between state capacity and transition capacity. In this article, we (1) evaluate NAPOLY using benchmarks packaged in the ANMLZoo benchmark suite, (2) evaluate the use of an SAT solver for allocating physical resources, and (3) compare NAPOLY\u2019s performance against existing solutions. NAPOLY performs most favorably on larger benchmarks, benchmarks with higher state activation frequency, and benchmarks with higher reporting frequency. NAPOLY outperforms the fastest of the CPU and GPU implementations in 10 out of 12 benchmarks.<\/jats:p>","DOI":"10.1145\/3593586","type":"journal-article","created":{"date-parts":[[2023,4,24]],"date-time":"2023-04-24T12:12:48Z","timestamp":1682338368000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["NAPOLY: A Non-deterministic Automata Processor OverLaY"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-1391-0166","authenticated-orcid":false,"given":"Rasha","family":"Karakchi","sequence":"first","affiliation":[{"name":"University of South Carolina"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0821-6258","authenticated-orcid":false,"given":"Jason D.","family":"Bakos","sequence":"additional","affiliation":[{"name":"University of South Carolina"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,6,22]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"crossref","unstructured":"Kevin Angstadt Jack Wadden Vinh Dang Ted Xie Dan Kramp Westley Weimer Mircea Stan and Kevin Skadron. 2018. MNCaRT: An open-source multi-architecture automata-processing research and execution ecosystem. IEEE Computer Architecture Letters 17 1 (2018) 84\u201387. 10.1109\/LCA.2017.2780105","DOI":"10.1109\/LCA.2017.2780105"},{"key":"e_1_3_1_3_2","article-title":"Data Structures, Algorithms and Architectures for Efficient Regular Expression Evaluation","author":"Becchi Michela","year":"2009","unstructured":"Michela Becchi and Crowley Patrick. 2009. Data Structures, Algorithms and Architectures for Efficient Regular Expression Evaluation. Washington University, St. Louis, MO.","journal-title":"Washington University, St. Louis, MO"},{"key":"e_1_3_1_4_2","doi-asserted-by":"crossref","unstructured":"Paul Dlugosch Dave Brown Paul Glendenning Michael Leventhal and Harold Noyes. 2014. An efficient and scalable semiconductor architecture for parallel automata processing. IEEE Transactions on Parallel and Distributed Systems 25 12 (2014) 3088\u20133098. 10.1109\/TPDS.2014.8","DOI":"10.1109\/TPDS.2014.8"},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","unstructured":"Andrew Putnam Adrian M. Caulfield Eric S. Chung Derek Chiou Kypros Constantinides John Demme Hadi Esmaeilzadeh Jeremy Fowers Gopi Prashanth Gopal Jan Gray Michael Haselman Scott Hauck Stephen Heil Amir Hormati Joo-Young Kim Sitaram Lanka James Larus Eric Peterson Simon Pope Aaron Smith Jason Thong Phillip Yi Xiao and Doug Burger. 2015. A reconfigurable fabric for accelerating large-scale datacenter services. IEEE Micro 35 3 (2015) 10\u201322. 10.1109\/MM.2015.42","DOI":"10.1109\/MM.2015.42"},{"key":"e_1_3_1_6_2","doi-asserted-by":"crossref","unstructured":"Arun Subramaniyan Jingcheng Wang Ezhil R. M. Balasubramanian David Blaauw Dennis Sylvester and Reetuparna Das. 2017. Cache automaton. In Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture (Cambridge Massachusetts) (MICRO-50\u201917) Association for Computing Machinery New York NY 259\u2013272. 10.1145\/3123939.3123986","DOI":"10.1145\/3123939.3123986"},{"key":"e_1_3_1_7_2","doi-asserted-by":"crossref","unstructured":"Danhua Guo Guangdeng Liao Laxmi N. Bhuyan Bin Liu and Jianxun Jason Ding. 2008. A scalable multithreaded L7-Filter design for multi-core servers. In Proceedings of the 4th ACM\/IEEE Symposium on Architectures for Networking and Communications Systems (San Jose California) (ANCS\u201908) . Association for Computing Machinery New York NY 60\u201368. 10.1145\/1477942.1477952","DOI":"10.1145\/1477942.1477952"},{"key":"e_1_3_1_8_2","doi-asserted-by":"crossref","unstructured":"Elaheh Sadredini Reza Rahimi Marzieh Lenjani Mircea Stan and Kevin Skadron. 2020. Impala: Algorithm\/Architecture co-design for in-memory multi-stride pattern matching. In IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920) . 86\u201398. 10.1109\/HPCA47549.2020.00017","DOI":"10.1109\/HPCA47549.2020.00017"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/2775054.2694347"},{"key":"e_1_3_1_10_2","volume-title":"IEEE International Symposium on Workload Characterization (IISWC\u201916)","author":"Wadden Jack","year":"2016","unstructured":"Jack Wadden, Vinh Dang, Nathan Brunelle, Tommy Tracy II, Deyuan Guo, Elaheh Sadredini, Ke Wang, Chunkun Bo, Gabriel Robins, Mircea Stan, and Kevin Skadron. 2016. ANMLzoo: a benchmark suite for exploring bottlenecks in automata processing engines and architectures. In IEEE International Symposium on Workload Characterization (IISWC\u201916). 1\u201312. 10.1109\/IISWC.2016.7581271"},{"key":"e_1_3_1_11_2","volume-title":"Design, Automation Test in Europe Conference Exhibition (DATE\u201919)","author":"Yu Jintao","year":"2019","unstructured":"Jintao Yu, Hoang Anh Du Nguyen, Muath Abu Lebdeh, Mottaqiallah Taouil, and Said Hamdioui. 2019. Time-division multiplexing automata processor. In Design, Automation Test in Europe Conference Exhibition (DATE\u201919). 794\u2013799. 10.23919\/DATE.2019.8715140"},{"key":"e_1_3_1_12_2","volume-title":"2011 ACM\/IEEE 7th Symposium on Architectures for Networking and Communications Systems","author":"Peng Kunyang","year":"2011","unstructured":"Kunyang Peng, Siyuan Tang, Min Chen, and Qunfeng Dong. 2011. Chain-based DFA deflation for fast and scalable regular expression matching using TCAM. In 2011 ACM\/IEEE 7th Symposium on Architectures for Networking and Communications Systems. 24\u201335. 10.1109\/ANCS.2011.13"},{"key":"e_1_3_1_13_2","volume-title":"2015 IEEE International Parallel and Distributed Processing Symposium","author":"Wang Ke","year":"2015","unstructured":"Ke Wang, Yanjun Qi, Jeffrey J. Fox, Mircea R. Stan, and Kevin Skadron. 2015. Association rule mining with the micron automata processor. In 2015 IEEE International Parallel and Distributed Processing Symposium. 689\u2013699. 10.1109\/IPDPS.2015.101"},{"key":"e_1_3_1_14_2","volume-title":"Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (Providence, RI, USA) (ASPLOS\u201919)","author":"Casias Matthew","year":"2019","unstructured":"Matthew Casias, Kevin Angstadt, Tommy Tracy II, Kevin Skadron, and Westley Weimer. 2019. Debugging support for pattern-matching languages and accelerators. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (Providence, RI, USA) (ASPLOS\u201919). Association for Computing Machinery, New York, NY, 1073\u20131086. 10.1145\/3297858.3304066"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/1880153.1880157"},{"key":"e_1_3_1_16_2","volume-title":"International Conference on ReConFigurable Computing and FPGAs (ReConFig\u201917)","author":"Karakchi Rasha","year":"2017","unstructured":"Rasha Karakchi, Lothrop O. Richards, and Jason D. Bakos. 2017. A dynamically reconfigurable automata processor overlay. In International Conference on ReConFigurable Computing and FPGAs (ReConFig\u201917). 1\u20138. 10.1109\/RECONFIG.2017.8279779"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP.2019.000-7"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/2560040"},{"key":"e_1_3_1_19_2","unstructured":"Rajesh Nishtala Hans Fugal Steven Grimm Marc Kwiatkowski Herman Lee Harry C Li Ryan McElroy Mike Paleczny Daniel Peek Paul Saab et\u00a0al. 2013. Scaling memcache at facebook. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI\u201913) . 385\u2013398."},{"key":"e_1_3_1_20_2","doi-asserted-by":"crossref","unstructured":"Reza Rahimi Elaheh Sadredini Mircea Stan and Kevin Skadron. 2020. Grapefruit: An open-source full-stack and customizable automata processing on FPGAs. In IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201920) . 138\u2013147. 10.1109\/FCCM48280.2020.00027","DOI":"10.1109\/FCCM48280.2020.00027"},{"key":"e_1_3_1_21_2","doi-asserted-by":"crossref","unstructured":"Yuanwei Fang Tung T. Hoang Michela Becchi and Andrew A. Chien. 2015. Fast support for unstructured data processing: The unified automata processor. In Proceedings of the 48th International Symposium on Microarchitecture (Waikiki Hawaii) (MICRO-48) . Association for Computing Machinery New York NY 533\u2013545. 10.1145\/2830772.2830809","DOI":"10.1145\/2830772.2830809"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA53966.2022.00011"},{"key":"e_1_3_1_23_2","volume-title":"Proceedings of the 4th ACM\/IEEE Symposium on Architectures for Networking and Communications Systems (San Jose, California) (ANCS\u201908)","author":"Yang Yi-Hua E.","year":"2008","unstructured":"Yi-Hua E. Yang, Weirong Jiang, and Viktor K. Prasanna. 2008. Compact architecture for high-throughput regular expression matching on FPGA. In Proceedings of the 4th ACM\/IEEE Symposium on Architectures for Networking and Communications Systems (San Jose, California) (ANCS\u201908). Association for Computing Machinery, New York, NY, 30\u201339. 10.1145\/1477942.1477948"},{"key":"e_1_3_1_24_2","article-title":"Conversational speech transcription using context-dependent deep neural networks.","author":"Seide G. Li F.","year":"2011","unstructured":"G. Li F. Seide and D. Yu. 2011. Conversational speech transcription using context-dependent deep neural networks. In Proceedings of the 12th Annual Conference of the International Speech Communication Association.","journal-title":"In Proceedings of the 12th Annual Conference of the International Speech Communication Association"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1038\/nmeth.1376"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4302-2713-7_6"},{"key":"e_1_3_1_27_2","article-title":"An Introduction to Formal Languages and Automata.","author":"Linz Peter","year":"2006","unstructured":"Peter Linz. 2006. An Introduction to Formal Languages and Automata. Jones and Bartlett Learning.","journal-title":"Jones and Bartlett Learning"},{"key":"e_1_3_1_28_2","unstructured":"Jason D. Bakos and Rasha Karakchi. Jan 2023. nfatool. https:\/\/github.com\/HeRCLab\/nfatool."},{"key":"e_1_3_1_29_2","volume-title":"Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium","author":"Roy I.","year":"2014","unstructured":"I. Roy and S. Aluru. 2014. Finding motifs in biological sequences using the micron automata processor. In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium."},{"key":"e_1_3_1_30_2","first-page":"244","volume-title":"Proceedings of the 12th International Conference on Theory and Applications of Satisfiability Testing","author":"Soos Mate","year":"2009","unstructured":"Mate Soos, Karsten Nohl, and Claude Castelluccia. 2009. Extending SAT solvers to cryptographic problems. In Proceedings of the 12th International Conference on Theory and Applications of Satisfiability Testing. 244\u2013257. DOI:10.1007\/978-3-642-02777-2_24"},{"key":"e_1_3_1_31_2","doi-asserted-by":"crossref","DOI":"10.1145\/2213836.2213863","article-title":"Skeleton automata for FPGAs: Reconfiguring without reconstructing.","author":"Jens Louis Woods Teubner,","year":"2012","unstructured":"Louis Woods Teubner, Jens, and Chongling Nie. 2012. Skeleton automata for FPGAs: Reconfiguring without reconstructing. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM.","journal-title":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM"},{"key":"e_1_3_1_32_2","volume-title":"Proceedings of the International Workshop on Architectures and Systems for Big Data (ASBD) in Conjunction with the 42nd International Symposium on Computer Architecture (ISCA\u201915)","author":"Tracy T.","year":"2015","unstructured":"T. Tracy, M. Stan, N. Brunelle, J. Wadden, K. Wang, K. Skadron, and G. Robins. 2015. Nondeterministic finite automata in hardware \u2013 the case of the Levenshtein automaton. In Proceedings of the International Workshop on Architectures and Systems for Big Data (ASBD) in Conjunction with the 42nd International Symposium on Computer Architecture (ISCA\u201915)."},{"key":"e_1_3_1_33_2","volume-title":"Proceedings of the 24th IEEE International Symposium on High-Performance Computer Architecture","author":"Wadden J.","year":"2018","unstructured":"J. Wadden, K. Angstadt, and K. Skadron. 2018. Characterizing and mitigating output reporting bottlenecks in spatial automata processing architectures. In Proceedings of the 24th IEEE International Symposium on High-Performance Computer Architecture."},{"key":"e_1_3_1_34_2","volume-title":"Proceedings of the IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines","author":"Wadden J.","year":"2017","unstructured":"J. Wadden, K. Samira Khan, and K. Skadron. 2017. Automata-to-routing: An open-source toolchain for design-space exploration of spatial automata processing architectures. In Proceedings of the IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines."},{"key":"e_1_3_1_35_2","article-title":"VASim: An Open Virtual Automata Simulator for Automata Processing Application and Architecture Research","author":"Wadden J.","year":"2016","unstructured":"J. Wadden and K. Shadron. 2016. VASim: An Open Virtual Automata Simulator for Automata Processing Application and Architecture Research. Technical Report CS2016-03, University of Virginia.","journal-title":"Technical Report CS2016-03, University of Virginia."},{"key":"e_1_3_1_36_2","volume-title":"Proceedings of the 31st ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA\u201923)","author":"Wang Xuan","year":"2023","unstructured":"Xuan Wang, Lei Gong, Jing Cao, and Wenqi Lou. 2023. hAP: A spatial-von Neumann heterogeneous automata processor with optimized resource and IO overhead on FPGA. In Proceedings of the 31st ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA\u201923)."},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2011.129"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3593586","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3593586","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:38:01Z","timestamp":1750178281000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3593586"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,22]]},"references-count":36,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,9,30]]}},"alternative-id":["10.1145\/3593586"],"URL":"https:\/\/doi.org\/10.1145\/3593586","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,22]]},"assertion":[{"value":"2022-08-17","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-03-27","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-06-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}