{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T16:06:35Z","timestamp":1780675595271,"version":"3.54.1"},"reference-count":105,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2024,11,22]],"date-time":"2024-11-22T00:00:00Z","timestamp":1732233600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2024,12,31]]},"abstract":"<jats:p>\n            Stream processing, which involves real-time computation of data as it is created or received, is vital for various applications, specifically wireless communication. The evolving protocols, the requirement for high-throughput, and the challenges of handling diverse processing patterns make it demanding. Traditional platforms grapple with meeting real-time throughput and latency requirements due to large data volume, sequential and indeterministic data arrival, and variable data rates, leading to inefficiencies in memory access and parallel processing. We present Canalis, a throughput-optimized framework designed to address these challenges, ensuring high-performance while achieving low energy consumption. Canalis is a hardware-software co-designed system. It includes a programmable spatial architecture, Flux Stream Processing Unit (FluxSPU), proposed by this work to enhance data throughput and energy efficiency. FluxSPU is accompanied by a software stack that eases the programming process. We evaluated Canalis with eight distinct benchmarks. When compared to CPU and GPU in mobile SoC to demonstrate the effectiveness of domain specialization, Canalis achieves an average speedup of 13.4\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\times\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            and 6.6\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\times\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            , and energy savings of 189.8\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\times\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            and 283.9\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\times\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            , respectively. In contrast to equivalent ASICs of the benchmarks, the average energy overhead of Canalis is within 2.4\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\times\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            , successfully maintaining generalizations without incurring significant overhead.\n          <\/jats:p>","DOI":"10.1145\/3695880","type":"journal-article","created":{"date-parts":[[2024,9,18]],"date-time":"2024-09-18T15:27:33Z","timestamp":1726673253000},"page":"1-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless Communication"],"prefix":"10.1145","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4168-6446","authenticated-orcid":false,"given":"Kuan-Yu","family":"Chen","sequence":"first","affiliation":[{"name":"University of Michigan, Ann Arbor, MI, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-4123-8480","authenticated-orcid":false,"given":"Thomas","family":"Mason Nelson","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7615-5514","authenticated-orcid":false,"given":"Alireza","family":"Khadem","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5598-281X","authenticated-orcid":false,"given":"Morteza","family":"Fayazi","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8688-1843","authenticated-orcid":false,"given":"Sanjay Sri Vallabh","family":"Singapuram","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4188-4650","authenticated-orcid":false,"given":"Ronald","family":"Dreslinski","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2457-4119","authenticated-orcid":false,"given":"Nishil","family":"Talati","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6658-5502","authenticated-orcid":false,"given":"Hun-Seok","family":"Kim","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6744-7075","authenticated-orcid":false,"given":"David","family":"Blaauw","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,11,22]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"ARM-software. 2023. Arm Optimized Routines. Retrieved from https:\/\/github.com\/ARM-software\/optimized-routines"},{"key":"e_1_3_1_3_2","unstructured":"clMathLibraries. 2017. clBLAS. Retrieved from https:\/\/github.com\/clMathLibraries\/clBLAS"},{"key":"e_1_3_1_4_2","unstructured":"clMathLibraries. 2016. clFFT. Retrieved from https:\/\/github.com\/clMathLibraries\/clFFT"},{"key":"e_1_3_1_5_2","unstructured":"ARM-software. 2024. CMSIS-DSP. Retrieved from https:\/\/github.com\/ARM-software\/CMSIS-DSP"},{"key":"e_1_3_1_6_2","unstructured":"marton78. 2022. PFFFT: A pretty fast FFT and fast convolution with PFFASTCONV. Retrieved from https:\/\/github.com\/marton78\/pffft"},{"key":"e_1_3_1_7_2","unstructured":"GitHub. 2024. XNNPACK. Retrieved from https:\/\/github.com\/google\/XNNPACK"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/12.48862"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACSSC.2003.1292368"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/L-CA.2008.1"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507772"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2003.1183551"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1995.479579"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2016.2638459"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/1037949.1024396"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/VLSITechnologyandCir46769.2022.9830509"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.5555\/647927.739401"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/VLSITechnologyandCir46769.2022.9830330"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2021.3084804"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00014"},{"key":"e_1_3_1_21_2","first-page":"141","volume-title":"Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture(MICRO \u201936)","author":"Ciricescu Silviu","year":"2003","unstructured":"Silviu Ciricescu, Ray Essick, Brian Lucas, Phil May, Kent Moat, Jim Norris, Michael Schuette, and Ali Saidi. 2003. The Reconfigurable Streaming Vector Processor (RSVPTM). In Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture(MICRO \u201936). IEEE Computer Society, 141."},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/2593069.2596667"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2014.12"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPGA.1998.707889"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358276"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPGA.1996.242438"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/641675.642111"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00025"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.5555\/647923.741212"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.1999.744349"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00023"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446059"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2011.2178275"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00084"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO56248.2022.00046"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358277"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605428"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2012.51"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.5555\/2014698.2014884"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815968"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSSC.1968.300136"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/2872887.2750390"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","unstructured":"Olivia Hsu Alexander Rucker Tian Zhao Kunle Olukotun and Fredrik Kjolstad. 2022. Stardust: Compiling sparse tensor algebra to a reconfigurable dataflow architecture. DOI: 10.48550\/ARXIV.2211.03251","DOI":"10.48550\/ARXIV.2211.03251"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/2.241423"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/HOTCHIPS.2006.7477853"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062262"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/40.918001"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.20"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3296979.3192379"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2004.1310763"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/ARRAYS.1988.18106"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.1982.1653825"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1987.13876"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/291006.291018"},{"key":"e_1_3_1_55_2","first-page":"129","article-title":"Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators","author":"Lee Yunsup","year":"2011","unstructured":"Yunsup Lee, Rimas Avizienis, Alex Bishara, Richard Xia, Derek Lockhart, Christopher Batten, and Krste Asanovi\u0107. 2011. Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators. In 2011 38th Annual International Symposium on Computer Architecture (ISCA \u201911), 129\u2013140.","journal-title":"2011 38th Annual International Symposium on Computer Architecture (ISCA \u201911)"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2006.37"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/2593069.2593105"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2000.854387"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2002.1188678"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-45234-8_7"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3582016.3582070"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/FTCS.1989.105549"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPGA.1996.564808"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/275107.275164"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2001.991104"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080255"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/2872887.2750380"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1145\/1028176.1006714"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/SBAC-PAD.2013.17"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/2508148.2485935"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2014.2315627"},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.1145\/2954679.2872415"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2018.032271058"},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP.1993.397124"},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1145\/2499370.2462176"},{"key":"e_1_3_1_76_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480047"},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1145\/359327.359336"},{"key":"e_1_3_1_78_2","doi-asserted-by":"publisher","DOI":"10.1145\/980152.980156"},{"key":"e_1_3_1_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.19"},{"key":"e_1_3_1_80_2","doi-asserted-by":"publisher","DOI":"10.1145\/3446382.3448360"},{"key":"e_1_3_1_81_2","doi-asserted-by":"publisher","DOI":"10.1109\/12.859540"},{"key":"e_1_3_1_82_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2019.8662346"},{"key":"e_1_3_1_83_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.vlsi.2017.02.002"},{"key":"e_1_3_1_84_2","doi-asserted-by":"publisher","DOI":"10.1145\/1233307.1233308"},{"key":"e_1_3_1_85_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA53966.2022.00030"},{"key":"e_1_3_1_86_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45937-5_14"},{"key":"e_1_3_1_87_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00042"},{"key":"e_1_3_1_88_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACSSC.2008.5074384"},{"key":"e_1_3_1_89_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2009.2013772"},{"key":"e_1_3_1_90_2","doi-asserted-by":"publisher","DOI":"10.1145\/3193827"},{"key":"e_1_3_1_91_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-89159-6_26"},{"key":"e_1_3_1_92_2","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3322229"},{"key":"e_1_3_1_93_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA53966.2022.00032"},{"key":"e_1_3_1_94_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00032"},{"key":"e_1_3_1_95_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00063"},{"key":"e_1_3_1_96_2","doi-asserted-by":"publisher","DOI":"10.1145\/3358177"},{"key":"e_1_3_1_97_2","doi-asserted-by":"publisher","DOI":"10.1109\/SAMOS.2016.7818353"},{"key":"e_1_3_1_98_2","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_3_1_99_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPGA.1996.564773"},{"key":"e_1_3_1_100_2","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555773"},{"key":"e_1_3_1_101_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.125"},{"key":"e_1_3_1_102_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.1995.535451"},{"key":"e_1_3_1_103_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2006.1696225"},{"key":"e_1_3_1_104_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2007.916616"},{"key":"e_1_3_1_105_2","doi-asserted-by":"publisher","DOI":"10.1109\/VLSIC.2014.6858388"},{"key":"e_1_3_1_106_2","doi-asserted-by":"publisher","DOI":"10.1109\/4.881217"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3695880","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3695880","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:04:29Z","timestamp":1750291469000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3695880"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,22]]},"references-count":105,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,12,31]]}},"alternative-id":["10.1145\/3695880"],"URL":"https:\/\/doi.org\/10.1145\/3695880","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,22]]},"assertion":[{"value":"2024-01-20","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-28","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}