{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,27]],"date-time":"2026-05-27T15:02:10Z","timestamp":1779894130955,"version":"3.53.1"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2026,5,27]],"date-time":"2026-05-27T00:00:00Z","timestamp":1779840000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/legalcode"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2026,6,30]]},"abstract":"<jats:p>\n                    The scientific community increasingly relies on Machine Learning (ML) for near-sensor processing, leveraging its strengths in tasks such as pattern recognition, anomaly detection, and real-time decision-making. These deployments demand accelerators that combine extremely high performance with programmability, ease of integration, and straightforward verification. We present\n                    <jats:sc>cgra4ml<\/jats:sc>\n                    , an open source, modular framework that generates parameterizable CGRA accelerators in synthesizable SystemVerilog RTL, tailored to common ML compute patterns found in scientific applications. The framework supports seamless system integration through AXI-compliant interfaces and open source DMA components, and it includes automatic firmware generation for programming the accelerator. A comprehensive verification suite and a runtime firmware stack further support deployment across diverse SoC platforms.\n                    <jats:sc>cgra4ml<\/jats:sc>\n                    provides a modular, full-stack infrastructure, including a Python API, SystemVerilog hardware, TCL toolflows, and a C runtime, which facilitates easy integration and experimentation, allowing scientists to focus on innovation rather than dealing with the intricacies of hardware design and optimization. We demonstrate the effectiveness of\n                    <jats:sc>cgra4ml<\/jats:sc>\n                    to implement common scientific edge neural networks using ASIC and FPGA design flows.\n                  <\/jats:p>","DOI":"10.1145\/3801097","type":"journal-article","created":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T14:12:14Z","timestamp":1773151934000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["<scp>cgra4ml<\/scp>\n                    : A Hardware\/Software Framework to Implement Neural Networks for Scientific Edge Computing"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9768-5349","authenticated-orcid":false,"given":"G.","family":"Abarajithan","sequence":"first","affiliation":[{"name":"Computer Science and Engineering, University of California San Diego, La Jolla, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-1046-5883","authenticated-orcid":false,"given":"Zhenghua","family":"Ma","sequence":"additional","affiliation":[{"name":"Computer Science and Engineering, University of California San Diego, La Jolla, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-4544-9542","authenticated-orcid":false,"given":"Ravidu","family":"Munasinghe","sequence":"additional","affiliation":[{"name":"Department of Electronic and Telecommunications, University of Moratuwa, Moratuwa, Sri Lanka"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6955-1888","authenticated-orcid":false,"given":"Francesco","family":"Restuccia","sequence":"additional","affiliation":[{"name":"Computer Science and Engineering, University of California San Diego, La Jolla, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9062-5570","authenticated-orcid":false,"given":"Ryan","family":"Kastner","sequence":"additional","affiliation":[{"name":"Computer Science and Engineering, University of California San Diego, La Jolla, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2026,5,27]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/LES.2024.3354081"},{"key":"e_1_3_1_3_2","unstructured":"Alex Forencich. 2024. Verilog AXI Components. Retrieved from https:\/\/github.com\/alexforencich\/verilog-axi\/"},{"key":"e_1_3_1_4_2","unstructured":"Alex Forencich. 2024. Verilog AXI Stream Components. Retrieved from https:\/\/github.com\/alexforencich\/verilog-axis\/"},{"key":"e_1_3_1_5_2","unstructured":"AMD-Xilinx. 2024. Vitis AI. Retrieved from https:\/\/www.xilinx.com\/products\/design-tools\/vitis\/vitis-ai.html"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.2172\/2204990"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1088\/1748-0221\/19\/03\/P03013"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3570928"},{"key":"e_1_3_1_9_2","unstructured":"Hendrik Borras Giuseppe Di Guglielmo Javier Duarte Nicol\u00f2 Ghielmetti Ben Hawks Scott Hauck Shih-Chieh Hsu Ryan Kastner Jason Liang Andres Meza et al. 2022. Open-source FPGA-ML codesign for the MLPerf tiny benchmark. arXiv:2206.11791. Retrieved from https:\/\/arxiv.org\/abs\/2206.11791"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2016.2616357"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP.2017.7995277"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-021-00356-5"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/2333660.2333747"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.5555\/2650280.2650333"},{"key":"e_1_3_1_15_2","unstructured":"Renesas Electronics Corporation. [n.\u2009d.]. Renesas STP Engine (IP Core). Retrieved December 9 2023 from https:\/\/web.archive.org\/web\/20231209041533\/https:\/\/www.renesas.com\/us\/en\/key-technologies\/artificial-intelligence\/stp-engine"},{"key":"e_1_3_1_16_2","unstructured":"Daniel Newbrook David Flynn and John Darlington. 2023. Nanosoc Re-usable MCU Platform. Retrieved from https:\/\/soclabs.org\/project\/nanosoc-re-usable-mcu-platform"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.3389\/fdata.2022.787421"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2022.3229767"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNS.2021.3087100"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750389"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1088\/1748-0221\/13\/07\/P07027"},{"key":"e_1_3_1_22_2","unstructured":"Javier Duarte Nhan Tran Ben Hawks Christian Herwig Jules Muhizi Shvetank Prakash and Vijay Janapa Reddi. 2022. FastML science benchmarks: Accelerating real-time scientific edge machine learning. arXiv:2207.07958. Retrieved from https:\/\/arxiv.org\/abs\/2207.07958"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","unstructured":"FastML Team. 2024. fastmachinelearning\/hls4ml. DOI: 10.5281\/zenodo.1201549","DOI":"10.5281\/zenodo.1201549"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/2.839324"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2012.51"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_27_2","unstructured":"Intel. 2024. OpenVINO\u2122 Toolkit. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/developer\/tools\/openvino-toolkit\/overview.html"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1561\/1000000060"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2012.6412157"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3357375"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC56929.2023.10247873"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/296399.296444"},{"key":"e_1_3_1_33_2","unstructured":"Mathworks. 2024. Deep Learning Processor IP Core Generation for Custom Board. Retrieved from https:\/\/www.mathworks.com\/help\/deep-learning-hdl\/ug\/define-custom-board-and-reference-design-for-dl-ip-core-workflow.html"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPGA.1996.564808"},{"key":"e_1_3_1_35_2","unstructured":"Thierry Moreau Tianqi Chen Ziheng Jiang Luis Ceze Carlos Guestrin and Arvind Krishnamurthy. 2018. A hardware-software blueprint for flexible deep learning specialization. arXiv:1807.04188. Retrieved from https:\/\/arxiv.org\/abs\/1807.04188"},{"key":"e_1_3_1_36_2","unstructured":"Daniel H. Noronha Bahar Salehpour and Steven J. E. Wilton. 2018. LeFlow: Enabling flexible FPGA high-level synthesis of tensorflow deep neural networks. arXiv:1807.05317. Retrieved from https:\/\/arxiv.org\/abs\/1807.05317"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","unstructured":"Alessandro Pappalardo. 2024. Xilinx\/brevitas. DOI: 10.5281\/zenodo.3333552","DOI":"10.5281\/zenodo.3333552"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2022.102561"},{"key":"e_1_3_1_39_2","first-page":"652","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Qi Charles R.","year":"2017","unstructured":"Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 652\u2013660."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3676641.3716013"},{"key":"e_1_3_1_41_2","unstructured":"QKeras Team. 2024. QKeras. Retrieved from https:\/\/github.com\/google\/qkeras"},{"key":"e_1_3_1_42_2","first-page":"18281","volume-title":"International Conference on Machine Learning","author":"Qu Huilin","year":"2022","unstructured":"Huilin Qu, Congqiao Li, and Sitian Qian. 2022. Particle transformer for jet tagging. In International Conference on Machine Learning. PMLR, 18281\u201318292."},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/NSS\/MIC44845.2022.10399017"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/12.859540"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD57390.2023.10323910"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD50377.2020.00070"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE51398.2021.9473955"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3482854"},{"key":"e_1_3_1_49_2","unstructured":"OpenAI Team. 2025. OpenAI Triton. Retrieved from https:\/\/openai.com\/index\/triton\/"},{"key":"e_1_3_1_50_2","unstructured":"PyTorch Team. 2025. PyTorch Dynamo. Retrieved from https:\/\/docs.pytorch.org\/docs\/stable\/torch.compiler_dynamo_overview.html"},{"key":"e_1_3_1_51_2","unstructured":"TinyML Team. 2025. MLPerf\u2122 Tiny Deep Learning Benchmarks for Embedded Devices. Retrieved from https:\/\/github.com\/mlcommons\/tiny"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021744"},{"key":"e_1_3_1_53_2","first-page":"NP11","volume-title":"APS Division of Plasma Physics Meeting Abstracts","author":"Wei Yumou","year":"2023","unstructured":"Yumou Wei, David Arnold, Rian Chandra, Nigel Dasilva, Christopher Hansen, Jeffrey Levesque, Boting Li, Matthew Notis, Michael Mauel, Gerald Navratil, et al. 2023. FPGA-based microsecond-latency MHD mode tracking using high-speed cameras and deep learning on HBT-EP. In APS Division of Plasma Physics Meeting Abstracts, NP11\u2013088."},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2017.2785257"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240765.3240801"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3801097","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,27]],"date-time":"2026-05-27T14:04:35Z","timestamp":1779890675000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3801097"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,5,27]]},"references-count":54,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2026,6,30]]}},"alternative-id":["10.1145\/3801097"],"URL":"https:\/\/doi.org\/10.1145\/3801097","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,5,27]]},"assertion":[{"value":"2025-06-03","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-02-16","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-05-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}