{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,24]],"date-time":"2025-10-24T16:48:21Z","timestamp":1761324501189,"version":"3.41.0"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,4,20]],"date-time":"2023-04-20T00:00:00Z","timestamp":1681948800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Software Analysis for Heterogeneous Computing Architectures","award":["191497"],"award-info":[{"award-number":["191497"]}]},{"name":"Swiss National Science Foundation (SNSF), by the National Science Foundation","award":["CCF 16-19245"],"award-info":[{"award-number":["CCF 16-19245"]}]},{"name":"NSF","award":["CNS-1718160"],"award-info":[{"award-number":["CNS-1718160"]}]},{"name":"DARPA through the Domain-Specific System on Chip"},{"name":"Applications Driving Architectures (ADA) Research Center"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2023,5,31]]},"abstract":"<jats:p>\n            The design of heterogeneous systems that include domain specific accelerators is a challenging and time-consuming process. While taking into account area constraints, designers must decide which parts of an application to accelerate in hardware and which to leave in software. Moreover, applications in domains such as Extended Reality (XR) offer opportunities for various forms of parallel execution, including loop level, task level, and pipeline parallelism. To assist the design process and expose every possible level of parallelism, we present\n            <jats:monospace>Trireme<\/jats:monospace>\n            , a fully automated tool-chain that explores multiple levels of parallelism and produces domain-specific accelerator designs and configurations that maximize performance, given an area budget. FPGA SoCs were used as target platforms, and Catapult HLS\u00a0[\n            <jats:xref ref-type=\"bibr\">7<\/jats:xref>\n            ] was used to synthesize RTL using a commercial 12 nm FinFET technology. Experiments on demanding benchmarks from the XR domain revealed a speedup of up to 20\u00d7, as well as a speedup of up to 37\u00d7 for smaller applications, compared to software-only implementations.\n          <\/jats:p>","DOI":"10.1145\/3580394","type":"journal-article","created":{"date-parts":[[2023,1,17]],"date-time":"2023-01-17T12:00:12Z","timestamp":1673956812000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Trireme: Exploration of Hierarchical Multi-level Parallelism for Hardware Acceleration"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6644-5200","authenticated-orcid":false,"given":"Georgios","family":"Zacharopoulos","sequence":"first","affiliation":[{"name":"Harvard University, Cambridge, MA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3037-8816","authenticated-orcid":false,"given":"Adel","family":"Ejjeh","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Champaign, IL, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5299-4746","authenticated-orcid":false,"given":"Ying","family":"Jing","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Champaign, IL, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0281-4086","authenticated-orcid":false,"given":"En-Yu","family":"Yang","sequence":"additional","affiliation":[{"name":"Harvard University, Cambridge, MA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4570-4613","authenticated-orcid":false,"given":"Tianyu","family":"Jia","sequence":"additional","affiliation":[{"name":"Harvard University, Cambridge, MA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0403-856X","authenticated-orcid":false,"given":"Iulian","family":"Brumar","sequence":"additional","affiliation":[{"name":"Harvard University, Cambridge, MA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1384-992X","authenticated-orcid":false,"given":"Jeremy","family":"Intan","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Champaign, IL, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9245-527X","authenticated-orcid":false,"given":"Muhammad","family":"Huzaifa","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Champaign, IL, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3403-5119","authenticated-orcid":false,"given":"Sarita","family":"Adve","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Champaign, IL, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0760-9690","authenticated-orcid":false,"given":"Vikram","family":"Adve","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Champaign, IL, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5730-9904","authenticated-orcid":false,"given":"Gu-Yeon","family":"Wei","sequence":"additional","affiliation":[{"name":"Harvard University, Cambridge, MA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0662-7889","authenticated-orcid":false,"given":"David","family":"Brooks","sequence":"additional","affiliation":[{"name":"Harvard University, Cambridge, MA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,4,20]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_3_2_3_2","first-page":"575","volume-title":"Communications ACM","author":"Bron Coen","year":"1973","unstructured":"Coen Bron and Joep Kerbosch. 1973. Algorithm 457: Finding all cliques of an undirected graph. In Communications ACM, Vol. 9. 575\u2013577."},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3546070"},{"key":"e_1_3_2_5_2","article-title":"Stratus High-Level Synthesis","year":"2016","unstructured":"Cadence. 2016. Stratus High-Level Synthesis. Retrieved from https:\/\/www.cadence.com\/en_US\/home\/tools\/digital-design-and-signoff\/synthesis\/stratus-high-level-synthesis.html.","journal-title":"Retrieved from https:\/\/www.cadence.com\/en_US\/home\/tools\/digital-design-and-signoff\/synthesis\/stratus-high-level-synthesis.html"},{"key":"e_1_3_2_6_2","first-page":"217","volume-title":"Proceedings of the ACM\/IEEE 41st International Symposium on Computer Architecture (ISCA)","author":"Campanoni Simone","year":"2014","unstructured":"Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy M. Jones, Gu-Yeon Wei, and David Brooks. 2014. HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs. In Proceedings of the ACM\/IEEE 41st International Symposium on Computer Architecture (ISCA). IEEE, 217\u2013228."},{"key":"e_1_3_2_7_2","volume-title":"Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems","author":"Canis Andrew","year":"2013","unstructured":"Andrew Canis, Jongsok Choi, Blair Fort, Ruolong Lian, Qijing Huang, Nazanin Calagar, Marcel Gort, Jia Jun Qin, Mark Aldham, Tomasz Czajkowski. et\u00a0al. 2013. From software to accelerators with LegUp high-level synthesis. In Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems. IEEE."},{"key":"e_1_3_2_8_2","article-title":"Catapult High-level Synthesis","year":"2017","unstructured":"Catapult. 2017. Catapult High-level Synthesis.. Retrieved from https:\/\/eda.sw.siemens.com\/en-US\/ic\/ic-design\/high-level-synthesis-and-verification-platform\/.","journal-title":"Retrieved from https:\/\/eda.sw.siemens.com\/en-US\/ic\/ic-design\/high-level-synthesis-and-verification-platform\/"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3385412.3385983"},{"key":"e_1_3_2_10_2","first-page":"365","volume-title":"ACM SIGARCH Computer Architecture News","author":"Esmaeilzadeh Hadi","year":"2011","unstructured":"Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. In ACM SIGARCH Computer Architecture News, Vol. 39. 365\u2013376."},{"key":"e_1_3_2_11_2","article-title":"A graph deep learning framework for high-level synthesis design space exploration","author":"Ferretti Lorenzo","year":"2021","unstructured":"Lorenzo Ferretti, Andrea Cini, Georgios Zacharopoulos, Cesare Alippi, and Laura Pozzi. 2021. A graph deep learning framework for high-level synthesis design space exploration. arXiv preprint arXiv:2111.14767 (2021).","journal-title":"arXiv preprint arXiv:2111.14767"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3570925"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC53511.2021.00014"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192379"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3178487.3178493"},{"key":"e_1_3_2_16_2","first-page":"1","volume-title":"Proceedings of the International Conference on Supercomputing","author":"Kumar Snehasish","year":"2016","unstructured":"Snehasish Kumar, Vijayalakshmi Srinivasan, Amirali Sharifian, Nick Sumner, and Arrvindh Shriraman. 2016. Peruse and profit: Estimating the accelerability of loops. In Proceedings of the International Conference on Supercomputing. 1\u201313."},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293910"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2004.1281665"},{"key":"e_1_3_2_19_2","unstructured":"LLVM Project. Circuit IR Compilers and Tools (CIRCT). https:\/\/github.com\/llvm\/circt."},{"key":"e_1_3_2_20_2","first-page":"245","volume-title":"Proceedings of the 51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)","author":"Margerm Steven","year":"2018","unstructured":"Steven Margerm, Amirali Sharifian, Apala Guha, Arrvindh Shriraman, and Gilles Pokam. 2018. TAPAS: Generating parallel accelerators from parallel programs. In Proceedings of the 51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). IEEE, 245\u2013257."},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10617-012-9096-8"},{"key":"e_1_3_2_22_2","first-page":"425","volume-title":"Proceedings of the IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)","author":"Nardi Luigi","year":"2019","unstructured":"Luigi Nardi, Artur Souza, David Koeplinger, and Kunle Olukotun. 2019. HyperMapper: A practical design space exploration framework. In Proceedings of the IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, 425\u2013426."},{"key":"e_1_3_2_23_2","first-page":"5","volume-title":"Proceedings of the ACM\/SIGDA International Symposium on Field-programmable Gate Arrays","author":"Nguyen Tan","year":"2016","unstructured":"Tan Nguyen, Swathi Gurumani, Kyle Rupnow, and Deming Chen. 2016. FCUDA-SoC: Platform integration for field-programmable SoC with the CUDA-to-FPGA compiler. In Proceedings of the ACM\/SIGDA International Symposium on Field-programmable Gate Arrays. 5\u201314."},{"key":"e_1_3_2_24_2","first-page":"35","volume-title":"Proceedings of the IEEE 7th Symposium on Application Specific Processors","author":"Papakonstantinou Alexandros","year":"2009","unstructured":"Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, and Wen-Mei W. Hwu. 2009. FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs. In Proceedings of the IEEE 7th Symposium on Application Specific Processors. IEEE, 35\u201342."},{"key":"e_1_3_2_25_2","volume-title":"Proceedings of the 23rd International Conference on Field Programmable Logic and Applications","author":"Pilato Christian","year":"2012","unstructured":"Christian Pilato and Fabrizio Ferrandi. 2012. Bambu: A free framework for the high level synthesis of complex applications. In Proceedings of the 23rd International Conference on Field Programmable Logic and Applications."},{"key":"e_1_3_2_26_2","first-page":"110","volume-title":"Proceedings of the IEEE International Symposium on Workload Characterization (IISWC)","author":"Reagen Brandon","year":"2014","unstructured":"Brandon Reagen, Robert Adolf, Yakun Sophia Shao, Gu-Yeon Wei, and David Brooks. 2014. MachSuite: Benchmarks for accelerator design and customized architectures. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC). IEEE, 110\u2013119."},{"key":"e_1_3_2_27_2","first-page":"471","volume-title":"Proceedings of the 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)","author":"Rogers Samuel","year":"2020","unstructured":"Samuel Rogers, Joshua Slycord, Mohammadreza Baharani, and Hamed Tabkhi. 2020. gem5-SALAM: A system architecture for LLVM-based accelerator modeling. In Proceedings of the 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). IEEE, 471\u2013482."},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3018743.3018758"},{"key":"e_1_3_2_29_2","first-page":"97","volume-title":"Proceedings of the 41st Annual International Symposium on Computer Architecture","author":"Shao Yakun Sophia","year":"2014","unstructured":"Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks. 2014. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. In Proceedings of the 41st Annual International Symposium on Computer Architecture. IEEE, 97\u2013108."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195697"},{"key":"e_1_3_2_31_2","first-page":"40","article-title":"Moore\u2019s law is dead. Now what?","volume":"13","author":"Simonite Tom","year":"2016","unstructured":"Tom Simonite. 2016. Moore\u2019s law is dead. Now what? MIT Technol. Rev. May 13 (2016), 40\u201341.","journal-title":"MIT Technol. Rev. May"},{"key":"e_1_3_2_32_2","article-title":"Parboil: A revised benchmark suite for scientific and commercial throughput computing","volume":"127","author":"Stratton John A.","year":"2012","unstructured":"John A. Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and Wen-mei W. Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Cent. Reliab. High-perform. Comput. 127 (2012).","journal-title":"Cent. Reliab. High-perform. Comput."},{"key":"e_1_3_2_33_2","article-title":"Vivado High-level Synthesis","year":"2017","unstructured":"Xilinx. 2017. Vivado High-level Synthesis. Retrieved from www.xilinx.com\/products\/design-tools\/vivado\/integration\/esl-design.html.","journal-title":"Retrieved from www.xilinx.com\/products\/design-tools\/vivado\/integration\/esl-design.html"},{"key":"e_1_3_2_34_2","article-title":"Xilinx All Programmable SoC portfolio","year":"2017","unstructured":"Xilinx. 2017. Xilinx All Programmable SoC portfolio. Retrieved from www.xilinx.com\/products\/silicon-devices\/soc.html.","journal-title":"Retrieved from www.xilinx.com\/products\/silicon-devices\/soc.html"},{"key":"e_1_3_2_35_2","unstructured":"Yuan Yao and Saketh Rama. yaoyuannnn. CAVA: Camera Vision Pipeline on gem5-Aladdin. https:\/\/github.com\/yaoyuannnn\/cava."},{"key":"e_1_3_2_36_2","first-page":"91","article-title":"Machine learning approach for loop unrolling factor prediction in high level synthesis","author":"Zacharopoulos Georgios","year":"2018","unstructured":"Georgios Zacharopoulos, Andrea Barbon, Giovanni Ansaloni, and Laura Pozzi. 2018. Machine learning approach for loop unrolling factor prediction in high level synthesis. In Proceedings of the IEEE International Conference on High Performance Computing & Simulation (HPCS). 91\u201397.","journal-title":"Proceedings of the IEEE International Conference on High Performance Computing & Simulation (HPCS)"},{"key":"e_1_3_2_37_2","first-page":"1","article-title":"Compiler-assisted selection of hardware acceleration candidates from application source code","author":"Zacharopoulos Georgios","year":"2019","unstructured":"Georgios Zacharopoulos, Lorenzo Ferretti, Giovanni Ansaloni, Giuseppe Di Guglielmo, Luca Carloni, and Laura Pozzi. 2019. Compiler-assisted selection of hardware acceleration candidates from application source code. In Proceedings of the International Conference on Computer Design. 1\u20139.","journal-title":"Proceedings of the International Conference on Computer Design"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2018.2818689"},{"key":"e_1_3_2_39_2","volume-title":"ClrFreqCFGPrinter: A Tool for Frequency Annotated Control Flow Graph Generation","author":"Zacharopoulos Georgios","year":"2017","unstructured":"Georgios Zacharopoulos and Laura Pozzi. 2017. ClrFreqCFGPrinter: A Tool for Frequency Annotated Control Flow Graph Generation. Technical Report. European LLVM Developers Meeting."},{"key":"e_1_3_2_40_2","first-page":"15","volume-title":"Proceedings of the IEEE\/ACM International Symposium on Code Generation and Optimization (CGO)","author":"Zhou Ruoyu","year":"2019","unstructured":"Ruoyu Zhou and Timothy M. Jones. 2019. Janus: Statically-driven and profile-guided automatic dynamic binary parallelisation. In Proceedings of the IEEE\/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 15\u201325."}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580394","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3580394","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:42Z","timestamp":1750178262000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580394"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,20]]},"references-count":39,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,5,31]]}},"alternative-id":["10.1145\/3580394"],"URL":"https:\/\/doi.org\/10.1145\/3580394","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2023,4,20]]},"assertion":[{"value":"2021-10-06","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-01-09","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-04-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}