{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,30]],"date-time":"2026-06-30T15:57:11Z","timestamp":1782835031277,"version":"3.54.5"},"reference-count":99,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,12,5]],"date-time":"2023-12-05T00:00:00Z","timestamp":1701734400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Intel\/NSF CAPA program, the NSF NeuroNex Award","award":["DBI-1707408"],"award-info":[{"award-number":["DBI-1707408"]}]},{"name":"NIH","award":["U01MH117079"],"award-info":[{"award-number":["U01MH117079"]}]},{"name":"NSERC Discovery","award":["RGPIN-2019-04613 and DGECR-2019-00120"],"award-info":[{"award-number":["RGPIN-2019-04613 and DGECR-2019-00120"]}]},{"name":"CFI John R. Evans Leaders Fund, NSF projects","award":["1937599 and 2211557"],"award-info":[{"award-number":["1937599 and 2211557"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2023,12,31]]},"abstract":"<jats:p>\n            In this article, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of convenient APIs that allows users to easily express flexible and complex inter-task communication structures. Second, TAPA adopts a coarse-grained floorplanning step during HLS compilation for accurate pipelining of potential critical paths. In addition, TAPA implements several optimization techniques specifically tailored for modern HBM-based FPGAs. In our experiments with a total of 43 designs, we improve the average frequency from 147 MHz to 297 MHz (a 102% improvement) with no loss of throughput and a negligible change in resource utilization. Notably, in 16 experiments, we make the originally unroutable designs achieve 274 MHz, on average. The framework is available at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"url\" xlink:href=\"https:\/\/github.com\/UCLA-VAST\/tapa\">https:\/\/github.com\/UCLA-VAST\/tapa<\/jats:ext-link>\n            and the core floorplan module is available at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"url\" xlink:href=\"https:\/\/github.com\/UCLA-VAST\/AutoBridge\">https:\/\/github.com\/UCLA-VAST\/AutoBridge<\/jats:ext-link>\n          <\/jats:p>","DOI":"10.1145\/3609335","type":"journal-article","created":{"date-parts":[[2023,9,18]],"date-time":"2023-09-18T11:56:27Z","timestamp":1695038187000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":40,"title":["TAPA: A Scalable Task-parallel Dataflow Programming Framework for Modern FPGAs with Co-optimization of HLS and Physical Design"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0705-9510","authenticated-orcid":false,"given":"Licheng","family":"Guo","sequence":"first","affiliation":[{"name":"University of California Los Angeles, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5885-0425","authenticated-orcid":false,"given":"Yuze","family":"Chi","sequence":"additional","affiliation":[{"name":"University of California Los Angeles, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0751-8227","authenticated-orcid":false,"given":"Jason","family":"Lau","sequence":"additional","affiliation":[{"name":"University of California Los Angeles, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7450-2842","authenticated-orcid":false,"given":"Linghao","family":"Song","sequence":"additional","affiliation":[{"name":"University of California Los Angeles, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6244-2101","authenticated-orcid":false,"given":"Xingyu","family":"Tian","sequence":"additional","affiliation":[{"name":"Simon Fraser University, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-0081-9812","authenticated-orcid":false,"given":"Moazin","family":"Khatti","sequence":"additional","affiliation":[{"name":"Simon Fraser University, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1946-2021","authenticated-orcid":false,"given":"Weikang","family":"Qiao","sequence":"additional","affiliation":[{"name":"University of California Los Angeles, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2133-7943","authenticated-orcid":false,"given":"Jie","family":"Wang","sequence":"additional","affiliation":[{"name":"University of California Los Angeles, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6546-1367","authenticated-orcid":false,"given":"Ecenur","family":"Ustun","sequence":"additional","affiliation":[{"name":"Cornell University, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0603-9697","authenticated-orcid":false,"given":"Zhenman","family":"Fang","sequence":"additional","affiliation":[{"name":"Simon Fraser University, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0778-0308","authenticated-orcid":false,"given":"Zhiru","family":"Zhang","sequence":"additional","affiliation":[{"name":"Cornell University, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2887-6963","authenticated-orcid":false,"given":"Jason","family":"Cong","sequence":"additional","affiliation":[{"name":"University of California Los Angeles, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,12,5]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2018.00068"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1201\/9781420013481"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2009.2015738"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.5555\/800262.809144"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.5555\/921086"},{"key":"e_1_3_2_7_2","unstructured":"Cadence. 2020. Retrieved from https:\/\/www.cadence.com\/"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/43.945302"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/337292.337441"},{"key":"e_1_3_2_10_2","first-page":"1185","volume-title":"Design, Automation & Test in Europe Conference & Exhibition (DATE\u201912)","author":"Chen Yibo","year":"2012","unstructured":"Yibo Chen, Guangyu Sun, Qiaosha Zou, and Yuan Xie. 2012. 3DHLS: Incorporating high-level synthesis in physical planning of three-dimensional (3D) ICs. In Design, Automation & Test in Europe Conference & Exhibition (DATE\u201912). IEEE, 1185\u20131190."},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICFPT47387.2019.00071"},{"key":"e_1_3_2_12_2","first-page":"288","volume-title":"ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays","author":"Cheng Jianyi","year":"2020","unstructured":"Jianyi Cheng, Lana Josipovic, George A. Constantinides, Paolo Ienne, and John Wickerson. 2020. Combining dynamic & static scheduling in high-level synthesis. In ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. 288\u2013298."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2021.3065902"},{"key":"e_1_3_2_14_2","first-page":"89","volume-title":"ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays","author":"Cheng Jianyi","year":"2022","unstructured":"Jianyi Cheng, John Wickerson, and George A. Constantinides. 2022. Finding and finessing static islands in dynamically scheduled circuits. In ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. 89\u2013100."},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2006.882481"},{"key":"e_1_3_2_16_2","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1109\/GLSV.1994.290003","volume-title":"4th Great Lakes Symposium on VLSI","author":"Cherabuddi Raghava V.","year":"1994","unstructured":"Raghava V. Cherabuddi and Magdy A. Bayoumi. 1994. Automated system partitioning for synthesis of multi-chip modules. In 4th Great Lakes Symposium on VLSI. IEEE, 15\u201320."},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18072.2020.9218680"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240765.3240850"},{"key":"e_1_3_2_19_2","article-title":"Extending high-level synthesis for task-parallel programs","author":"Chi Yuze","year":"2020","unstructured":"Yuze Chi, Licheng Guo, Young-kyu Choi, Jie Wang, and Jason Cong. 2020. Extending high-level synthesis for task-parallel programs. arXiv preprint arXiv:2009.11389 (2020).","journal-title":"arXiv preprint arXiv:2009.11389"},{"key":"e_1_3_2_20_2","first-page":"190","volume-title":"ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays","author":"Chi Yuze","year":"2022","unstructured":"Yuze Chi, Licheng Guo, and Jason Cong. 2022. Accelerating SSSP for power-law graphs. In ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. 190\u2013200."},{"key":"e_1_3_2_21_2","volume-title":"ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays","author":"Choi Young-kyu","year":"2021","unstructured":"Young-kyu Choi, Yuze Chi, Weikang Qiao, Nikola Samardzic, and Jason Cong. 2021. HBM Connect: High-performance HLS interconnect for FPGA HBM. In ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays."},{"key":"e_1_3_2_22_2","article-title":"When HLS meets FPGA HBM: Benchmarking and bandwidth optimization","author":"Choi Young-kyu","year":"2020","unstructured":"Young-kyu Choi, Yuze Chi, Jie Wang, Licheng Guo, and Jason Cong. 2020. When HLS meets FPGA HBM: Benchmarking and bandwidth optimization. arXiv preprint arXiv:2010.06075 (2020).","journal-title":"arXiv preprint arXiv:2010.06075"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/1278480.1278586"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2004.825872"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240765.3240838"},{"key":"e_1_3_2_26_2","first-page":"125","volume-title":"IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines","author":"Cong Jason","year":"2018","unstructured":"Jason Cong, Peng Wei, Cody Hao Yu, and Peipei Zhou. 2018. Latte: Locality aware transformation for high-level synthesis. In IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, 125\u2013128."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/1146909.1147025"},{"key":"e_1_3_2_28_2","first-page":"595","volume-title":"ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA\u201921)","author":"Dadu Vidushi","year":"2021","unstructured":"Vidushi Dadu, Sihao Liu, and Tony Nowatzki. 2021. PolyGraph: Exposing the value of flexibility for graph processing accelerators. In ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA\u201921). IEEE, 595\u2013608."},{"key":"e_1_3_2_29_2","doi-asserted-by":"crossref","first-page":"924","DOI":"10.1145\/3352460.3358276","volume-title":"52nd Annual IEEE\/ACM International Symposium on Microarchitecture","author":"Dadu Vidushi","year":"2019","unstructured":"Vidushi Dadu, Jian Weng, Sihao Liu, and Tony Nowatzki. 2019. Towards general purpose acceleration by exploiting common data-dependence forms. In 52nd Annual IEEE\/ACM International Symposium on Microarchitecture. 924\u2013939."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/99.660313"},{"key":"e_1_3_2_31_2","volume-title":"International Symposium on Field-Programmable Gate Arrays","author":"Dai Guohao","year":"2016","unstructured":"Guohao Dai, Yuze Chi, Yu Wang, and Huazhong Yang. 2016. FPGP: Graph processing framework on FPGA\u2014A case study of breadth-first search. In International Symposium on Field-Programmable Gate Arrays."},{"key":"e_1_3_2_32_2","volume-title":"International Symposium on Field-Programmable Gate Arrays","author":"Dai Guohao","year":"2017","unstructured":"Guohao Dai, Tianhao Huang, Yuze Chi, Ningyi Xu, Yu Wang, and Huazhong Yang. 2017. ForeGraph: Exploring large-scale graph processing on multi-FPGA architecture. In International Symposium on Field-Programmable Gate Arrays."},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.1985.1270101"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACSD.2006.33"},{"key":"e_1_3_2_35_2","first-page":"471","article-title":"The semantics of a simple language for parallel programming","volume":"74","author":"Kahn Gilles","year":"1974","unstructured":"Gilles Kahn. 1974. The semantics of a simple language for parallel programming. Inf. Process. 74 (1974), 471\u2013475.","journal-title":"Inf. Process."},{"key":"e_1_3_2_36_2","volume-title":"International Symposium on Field-Programmable Gate Arrays","author":"Guo Licheng","year":"2021","unstructured":"Licheng Guo, Yuze Chi, Jie Wang, Jason Lau, Weikang Qiao, Ecenur Ustun, Zhiru Zhang, and Jason Cong. 2021. AutoBridge: Coupling coarse-grained floorplanning and pipelining for high-frequency HLS design on multi-die FPGAs. In International Symposium on Field-Programmable Gate Arrays."},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18072.2020.9218718"},{"key":"e_1_3_2_38_2","first-page":"127","volume-title":"IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201919)","author":"Guo Licheng","year":"2019","unstructured":"Licheng Guo, Jason Lau, Zhenyuan Ruan, Peng Wei, and Jason Cong. 2019. Hardware acceleration of long read pairwise overlapping in genome sequencing: A race between FPGA and GPU. In IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201919). IEEE, 127\u2013135."},{"key":"e_1_3_2_39_2","first-page":"1","volume-title":"ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays","author":"Guo Licheng","year":"2022","unstructured":"Licheng Guo, Pongstorn Maidee, Yun Zhou, Chris Lavin, Jie Wang, Yuze Chi, Weikang Qiao, Alireza Kaviani, Zhiru Zhang, and Jason Cong. 2022. RapidStream: Parallel physical implementation of FPGA HLS designs. In ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. 1\u201312."},{"key":"e_1_3_2_40_2","unstructured":"Gurobi. 2020. Retrieved from https:\/\/www.gurobi.com\/"},{"key":"e_1_3_2_41_2","first-page":"75","volume-title":"ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays","author":"Pereira Andre Hahn","year":"2014","unstructured":"Andre Hahn Pereira and Vaughn Betz. 2014. CAD and routing architecture for interposer-based multi-FPGA systems. In ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. 75\u201384."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/359576.359585"},{"key":"e_1_3_2_43_2","unstructured":"Intel. 2020. Intel Stratix 10 FPGA. Retrieved from https:\/\/www.intel.com\/content\/dam\/www\/programmable\/us\/en\/pdfs\/literature\/hb\/stratix-10\/s10-overview.pdf"},{"key":"e_1_3_2_44_2","unstructured":"Intel-OpenCL-Examples. 2020. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/programmable\/products\/design-software\/embedded-software-developers\/opencl\/support.html"},{"key":"e_1_3_2_45_2","first-page":"127","volume-title":"ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays","author":"Josipovi\u0107 Lana","year":"2018","unstructured":"Lana Josipovi\u0107, Radhika Ghosal, and Paolo Ienne. 2018. Dynamically scheduled high-level synthesis. In ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. 127\u2013136."},{"key":"e_1_3_2_46_2","first-page":"186","volume-title":"ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays","author":"Josipovi\u0107 Lana","year":"2020","unstructured":"Lana Josipovi\u0107, Shabnam Sheikhha, Andrea Guerrieri, Paolo Ienne, and Jordi Cortadella. 2020. Buffer placement and sizing for high-performance dataflow circuits. In ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. 186\u2013196."},{"key":"e_1_3_2_47_2","volume-title":"International Federation for Information Processing (IFIP\u201974)","author":"Kahn Gilles","year":"1974","unstructured":"Gilles Kahn. 1974. The semantics of a simple language for parallel programming. In International Federation for Information Processing (IFIP\u201974)."},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/92.748202"},{"key":"e_1_3_2_49_2","first-page":"320","volume-title":"IEEE\/ACM International Conference on Computer Aided Design (ICCAD\u201901).","author":"Kim Daehong","year":"2001","unstructured":"Daehong Kim, Jinyong Jung, Sunghyun Lee, Jinhwan Jeon, and Kiyoung Choi. 2001. Behavior-to-placed RTL synthesis with performance-driven placement. In IEEE\/ACM International Conference on Computer Aided Design (ICCAD\u201901).. IEEE, 320\u2013325."},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.1986.1270219"},{"key":"e_1_3_2_51_2","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1145\/62882.62901","volume-title":"Papers on Twenty-five Years of Electronic Design Automation","author":"Lauther Ulrich","year":"1988","unstructured":"Ulrich Lauther. 1988. A min-cut placement algorithm for general cell assemblies based on a graph representation. In Papers on Twenty-five Years of Electronic Design Automation. 182\u2013191."},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1987.13876"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF01759032"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bty191"},{"key":"e_1_3_2_55_2","volume-title":"International Symposium on Field-Programmable Gate Arrays","author":"Li Jiajie","year":"2020","unstructured":"Jiajie Li, Yuze Chi, and Jason Cong. 2020. HeteroHalide: From image processing DSL to efficient FPGA acceleration. In International Symposium on Field-Programmable Gate Arrays."},{"key":"e_1_3_2_56_2","first-page":"227","volume-title":"International Conference on Computer Aided Design","author":"Lu Ruibing","year":"2003","unstructured":"Ruibing Lu and Cheng-Kok Koh. 2003. Performance optimization of latency insensitive systems through buffer queue sizing of communication channels. In International Conference on Computer Aided Design. IEEE, 227\u2013231."},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2005.854636"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/775832.775984"},{"key":"e_1_3_2_59_2","first-page":"93","volume-title":"International Great Lakes Symposium on VLSI (GLSVLSI\u201916)","author":"Mao Fubing","year":"2016","unstructured":"Fubing Mao, Wei Zhang, Bo Feng, Bingsheng He, and Yuchun Ma. 2016. Modular placement for interposer based multi-FPGA systems. In International Great Lakes Symposium on VLSI (GLSVLSI\u201916). IEEE, 93\u201398."},{"key":"e_1_3_2_60_2","unstructured":"Minimum-Cut. 2020. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Minimum_cut"},{"issue":"13","key":"e_1_3_2_61_2","first-page":"38","article-title":"An automatic floorplanner for up to 100,000 gates","volume":"8","author":"Modarres H.","year":"1987","unstructured":"H. Modarres and A. Kelapure. 1987. An automatic floorplanner for up to 100,000 gates. VLSI Syst. Des. 8, 13 (1987), 38.","journal-title":"VLSI Syst. Des."},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2015.7393136"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2015.2478280"},{"key":"e_1_3_2_64_2","volume-title":"International Symposium on Field-programmable Gate Arrays","author":"Papaphilippou Philippos","year":"2020","unstructured":"Philippos Papaphilippou, Jiuxi Meng, and Wayne Luk. 2020. High-performance FPGA network switch architecture. In International Symposium on Field-programmable Gate Arrays."},{"key":"e_1_3_2_65_2","volume-title":"VLSI Digital Signal Processing Systems: Design and Implementation","author":"Parhi Keshab K.","year":"2007","unstructured":"Keshab K. Parhi. 2007. VLSI Digital Signal Processing Systems: Design and Implementation. John Wiley & Sons."},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/117009.117015"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/356698.356702"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1145\/3107953"},{"key":"e_1_3_2_69_2","first-page":"1","volume-title":"IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201922)","author":"Qiao Weikang","year":"2022","unstructured":"Weikang Qiao, Licheng Guo, Zhenman Fang, Mau-Chung Frank Chang, and Jason Cong. 2022. TopSort: A high-performance two-phase sorting accelerator optimized on HBM-based FPGAs. In IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201922). IEEE, 1\u20131."},{"key":"e_1_3_2_70_2","first-page":"106","volume-title":"IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201922)","author":"Qiao Weikang","year":"2021","unstructured":"Weikang Qiao, Jihun Oh, Licheng Guo, Mau-Chung Frank Chang, and Jason Cong. 2021. FANS: FPGA-accelerated near-storage sorting. In IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201922). IEEE, 106\u2013114."},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD.1993.580064"},{"key":"e_1_3_2_72_2","first-page":"282","volume-title":"ACM\/IEEE 47th Annual International Symposium on Computer Architecture","author":"Samardzic Nikola","year":"2020","unstructured":"Nikola Samardzic, Weikang Qiao, Vaibhav Aggarwal, Mau-Chung Frank Chang, and Jason Cong. 2020. Bonsai: High-performance adaptive merge tree sorting. In ACM\/IEEE 47th Annual International Symposium on Computer Architecture. IEEE, 282\u2013294."},{"key":"e_1_3_2_73_2","unstructured":"H. G. Santos and T. A. M. Toffolo. 2020. Python MIP (Mixed-Integer Linear Programming) Tools. Retrieved from https:\/\/pypi.org\/project\/mip\/"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/CICC53496.2022.9772832"},{"key":"e_1_3_2_75_2","first-page":"133","volume-title":"ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays","author":"Sohrabizadeh Atefeh","year":"2020","unstructured":"Atefeh Sohrabizadeh, Jie Wang, and Jason Cong. 2020. End-to-end optimization of deep learning applications. In ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. 133\u2013139."},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1145\/3489517.3530420"},{"key":"e_1_3_2_77_2","volume-title":"ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays","author":"Song Linghao","year":"2022","unstructured":"Linghao Song, Yuze Chi, Atefeh Sohrabizadeh, Young kyu Choi, Jason Lau, and Jason Cong. 2022. Sextans: A streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication. In ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays."},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD.2003.159736"},{"key":"e_1_3_2_79_2","unstructured":"Synopsys. 2020. Retrieved from https:\/\/www.synopsys.com\/"},{"key":"e_1_3_2_80_2","first-page":"190","volume-title":"ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays","author":"Tan Mingxing","year":"2015","unstructured":"Mingxing Tan, Steve Dai, Udit Gupta, and Zhiru Zhang. 2015. Mapping-aware constrained scheduling for LUT-based FPGAs. In ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. 190\u2013199."},{"key":"e_1_3_2_81_2","unstructured":"Xingyu Tian Zhifan Ye Alec Lu Licheng Guo Yuze Chi and Zhenman Fang. 2022. SASA: A Scalable and Automatic Stencil Acceleration Framework for Optimized Hybrid Spatial and Temporal Parallelism on HBM-based FPGAs. arxiv:cs.AR\/2208.10770"},{"key":"e_1_3_2_82_2","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1109\/FCCM.2014.30","volume-title":"IEEE 22nd Annual International Symposium on Field-programmable Custom Computing Machines","author":"Venkataramani Girish","year":"2014","unstructured":"Girish Venkataramani and Yongfeng Gu. 2014. System-level retiming and pipelining. In IEEE 22nd Annual International Symposium on Field-programmable Custom Computing Machines. IEEE, 80\u201387."},{"key":"e_1_3_2_83_2","first-page":"78","volume-title":"IEEE 27th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM\u201919)","author":"Voss Nils","year":"2019","unstructured":"Nils Voss, Pablo Quintana, Oskar Mencer, Wayne Luk, and Georgi Gaydadjiev. 2019. Memory mapping for multi-die FPGAs. In IEEE 27th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM\u201919). IEEE, 78\u201386."},{"key":"e_1_3_2_84_2","volume-title":"ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays","author":"Wang Jie","year":"2021","unstructured":"Jie Wang, Licheng Guo, and Jason Cong. 2021. AutoSA: A polyhedral compiler for high-performance systolic arrays on FPGA. In ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays."},{"key":"e_1_3_2_85_2","volume-title":"IEEE Annual International Symposium on Field-programmable Custom Computing Machines (FCCM\u201919)","author":"Wang Yu","year":"2019","unstructured":"Yu Wang, James C. Hoe, and Eriko Nurvitadhi. 2019. Processor assisted worklist scheduling for FPGA accelerated graph processing on a shared-memory platform. In IEEE Annual International Symposium on Field-programmable Custom Computing Machines (FCCM\u201919)."},{"key":"e_1_3_2_86_2","first-page":"268","volume-title":"ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA\u201920)","author":"Weng Jian","year":"2020","unstructured":"Jian Weng, Sihao Liu, Vidushi Dadu, Zhengrong Wang, Preyas Shah, and Tony Nowatzki. 2020. DSAGEN: Synthesizing programmable spatial accelerators. In ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA\u201920). IEEE, 268\u2013281."},{"key":"e_1_3_2_87_2","first-page":"703","volume-title":"IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920)","author":"Weng Jian","year":"2020","unstructured":"Jian Weng, Sihao Liu, Zhengrong Wang, Vidushi Dadu, and Tony Nowatzki. 2020. A hybrid systolic-dataflow architecture for inductive matrix algorithms. In IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920). IEEE, 703\u2013716."},{"key":"e_1_3_2_88_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICFPT51103.2020.00035"},{"key":"e_1_3_2_89_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507740"},{"key":"e_1_3_2_90_2","unstructured":"Xilinx. 2020. Vivado Design Suite. Retrieved from https:\/\/www.xilinx.com\/products\/design-tools\/vivado.html"},{"key":"e_1_3_2_91_2","unstructured":"Xilinx. 2020. Xilinx UltraScale Plus Architecture. Retrieved from https:\/\/www.xilinx.com\/products\/silicon-devices\/fpga\/virtex-ultrascale-plus.html"},{"key":"e_1_3_2_92_2","unstructured":"Xilinx. 2020. Xilinx Vitis Unified Platform. Retrieved from https:\/\/www.xilinx.com\/products\/design-tools\/vitis\/vitis-platform.html"},{"key":"e_1_3_2_93_2","unstructured":"Xilinx-HBM. 2020. Retrieved from https:\/\/www.xilinx.com\/products\/silicon-devices\/fpga\/virtex-ultrascale-plus-hbm.html"},{"key":"e_1_3_2_94_2","unstructured":"Xilinx-Vitis-Library. 2020. Retrieved from https:\/\/github.com\/Xilinx\/Vitis_Libraries"},{"key":"e_1_3_2_95_2","doi-asserted-by":"publisher","DOI":"10.1145\/268424.268425"},{"key":"e_1_3_2_96_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378491"},{"key":"e_1_3_2_97_2","first-page":"1","volume-title":"IEEE International Symposium on Circuits and Systems (ISCAS\u201919)","author":"Zhang Jiaxi","year":"2019","unstructured":"Jiaxi Zhang, Wentai Zhang, Guojie Luo, Xuechao Wei, Yun Liang, and Jason Cong. 2019. Frequency improvement of systolic array-based CNNs on FPGAs. In IEEE International Symposium on Circuits and Systems (ISCAS\u201919). IEEE, 1\u20134."},{"key":"e_1_3_2_98_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE.2019.8714724"},{"key":"e_1_3_2_99_2","first-page":"1","volume-title":"ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays","author":"Zheng Hongbin","year":"2014","unstructured":"Hongbin Zheng, Swathi T. Gurumani, Kyle Rupnow, and Deming Chen. 2014. Fast and effective placement and routing directed high-level synthesis for FPGAs. In ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. 1\u201310."},{"key":"e_1_3_2_100_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2019.2910068"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3609335","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3609335","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:22Z","timestamp":1750178782000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3609335"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,5]]},"references-count":99,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,12,31]]}},"alternative-id":["10.1145\/3609335"],"URL":"https:\/\/doi.org\/10.1145\/3609335","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,5]]},"assertion":[{"value":"2022-08-25","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-05","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-12-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}