{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,20]],"date-time":"2025-08-20T13:19:29Z","timestamp":1755695969707,"version":"3.41.0"},"reference-count":28,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,8,8]],"date-time":"2022-08-08T00:00:00Z","timestamp":1659916800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2022,12,31]]},"abstract":"<jats:p>\n            We can overcome the pessimism in worst-case routing latency analysis of timing-predictable Network-on-Chip (NoC) workloads by single-digit factors through the use of a hybrid field-programmable gate array (FPGA)\u2013optimized NoC and workload-adapted regulation. Timing-predictable FPGA-optimized NoCs such as HopliteBuf integrate stall-free FIFOs that are sized using offline static analysis of a user-supplied flow pattern and rates. For certain bursty traffic and flow configurations, static analysis delivers very large, sometimes infeasible, FIFO size bounds and large worst-case latency bounds. Alternatively, backpressure-based NoCs such as HopliteBP can operate with lower latencies for certain bursty flows. However, they suffer from severe pessimism in the analysis due to the effect of pipelining of packets and interleaving of flows at switch ports. As we show in this article, a hybrid FPGA NoC that seamlessly composes both design styles on a per-switch basis delivers the best of both worlds, with improved feasibility (bounded operation) and tighter latency bounds. We select the NoC switch configuration through a novel evolutionary algorithm based on Maximum Likelihood Estimation (MLE). For synthetic (\n            <jats:monospace>RANDOM<\/jats:monospace>\n            ,\n            <jats:monospace>LOCAL<\/jats:monospace>\n            ) and real-world (\n            <jats:monospace>SpMV<\/jats:monospace>\n            ,\n            <jats:monospace>Graph<\/jats:monospace>\n            ) workloads, we demonstrate \u22482\u20133\u00d7 improvements in feasibility and \u22481\u20136.8\u00d7 in worst-case latency while requiring an LUT cost only \u22481\u20131.5\u00d7 larger than the cheapest HopliteBuf solution. We also deploy and verify our NoC (PL) and MLE framework (PS) on a Pynq-Z1 to adapt and reconfigure NoC switches dynamically. We can further improve a workload\u2019s routability by learning to surgically tune regulation rates for each traffic trace to maximize available routing bandwidth. We capture critical dependency between traces by modelling the regulation space as a multivariate Gaussian distribution and learn the distribution\u2019s parameters using Covariance Matrix Adaptation Evolution Strategy (CMA-ES). We also propose\n            <jats:italic>nested<\/jats:italic>\n            learning, which learns switch configurations and regulation rates in tandem. Compared with stand-alone switch learning, this symbiotic nested learning helps achieve \u2248 1.5\u00d7 lower cost constrained latency, \u2248 3.1\u00d7 faster individual rates, and \u2248 1.4\u00d7 faster mean rates. We also evaluate improvements to vanilla NoCs\u2019 routing using only stand-alone rate learning (no switch learning), with \u2248 1.6\u00d7 lower latency across synthetic and real-world benchmarks.\n          <\/jats:p>","DOI":"10.1145\/3507699","type":"journal-article","created":{"date-parts":[[2022,2,14]],"date-time":"2022-02-14T16:51:12Z","timestamp":1644857472000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["HopliteML: Evolving Application Customized FPGA NoCs with Adaptable Routers and Regulators"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7780-5267","authenticated-orcid":false,"given":"Gurshaant","family":"Malik","sequence":"first","affiliation":[{"name":"University of Waterloo, Ontario, Canada"}]},{"given":"Ian Elmore","family":"Lang","sequence":"additional","affiliation":[{"name":"University of Waterloo, Ontario, Canada"}]},{"given":"Rodolfo","family":"Pellizzoni","sequence":"additional","affiliation":[{"name":"University of Waterloo, Ontario, Canada"}]},{"given":"Nachiket","family":"Kapre","sequence":"additional","affiliation":[{"name":"University of Waterloo, Ontario, Canada"}]}],"member":"320","published-online":{"date-parts":[[2022,8,8]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-5041-2940-4_9"},{"key":"e_1_3_2_3_2","volume-title":"Bayes and Empirical Bayes Methods for Data Analysis","author":"Carlin Bradley P.","year":"2010","unstructured":"Bradley P. Carlin and Thomas A. Louis. 2010. Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall\/CRC."},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12532-018-0144-7"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293917"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3375899"},{"key":"e_1_3_2_7_2","article-title":"The CMA evolution strategy: A tutorial","author":"Hansen Nikolaus","year":"2016","unstructured":"Nikolaus Hansen. 2016. The CMA evolution strategy: A tutorial. arXiv:1604.00772.","journal-title":"arXiv:1604.00772"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2012.6412110"},{"key":"e_1_3_2_9_2","first-page":"588","volume-title":"13th International Conference on Advanced Communication Technology (ICACT\u201911)","author":"Jeon S.","year":"2011","unstructured":"S. Jeon, J. Cho, Y. Jung, S. Park, and T. Han. 2011. Automotive hardware development according to ISO 26262. In 13th International Conference on Advanced Communication Technology (ICACT\u201911). 588\u2013592."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2015.7293956"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3027486"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689081"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCOM.1987.1096719"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2015.2405614"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2012.241"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45318-0"},{"key":"e_1_3_2_17_2","unstructured":"Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. (June 2014). https:\/\/snap.stanford.edu\/citing.html."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL50879.2020.00015"},{"key":"e_1_3_2_19_2","first-page":"154","volume-title":"IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201919)","author":"Malik G. S.","year":"2019","unstructured":"G. S. Malik and N. Kapre. 2019. Enhancing butterfly fat tree NoCs for FPGAs with lightweight flow control. In IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201919). 154\u2013162."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/2145694.2145703"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3140659.3080254"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2020.2987797"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2015.2488490"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.7873\/DATE.2015.0418"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISMVL.1998.679468"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293908"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2017.8280122"},{"key":"e_1_3_2_28_2","unstructured":"Saud Wasly Rodolfo Pellizzoni and Nachiket Kapre. 2017. Worst case latency analysis for Hoplite FPGA-based NoC. (2017). https:\/\/uwspace.uwaterloo.ca\/handle\/10012\/12600."},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/2508148.2485972"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3507699","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3507699","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:10:15Z","timestamp":1750183815000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3507699"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,8]]},"references-count":28,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,12,31]]}},"alternative-id":["10.1145\/3507699"],"URL":"https:\/\/doi.org\/10.1145\/3507699","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"type":"print","value":"1936-7406"},{"type":"electronic","value":"1936-7414"}],"subject":[],"published":{"date-parts":[[2022,8,8]]},"assertion":[{"value":"2021-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-08-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}