{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,24]],"date-time":"2026-07-24T02:34:56Z","timestamp":1784860496245,"version":"3.55.0"},"reference-count":177,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,8,8]],"date-time":"2022-08-08T00:00:00Z","timestamp":1659916800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"CRISP"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2022,12,31]]},"abstract":"<jats:p>The year 2011 marked an important transition for FPGA high-level synthesis (HLS), as it went from prototyping to deployment. A decade later, in this article, we assess the progress of the deployment of HLS technology and highlight the successes in several application domains, including deep learning, video transcoding, graph processing, and genome sequencing. We also discuss the challenges faced by today\u2019s HLS technology and the opportunities for further research and development, especially in the areas of achieving high clock frequency, coping with complex pragmas and system integration, legacy code transformation, building on open source HLS infrastructures, supporting domain-specific languages, and standardization. It is our hope that this article will inspire more research on FPGA HLS and bring it to a new height.<\/jats:p>","DOI":"10.1145\/3530775","type":"journal-article","created":{"date-parts":[[2022,4,21]],"date-time":"2022-04-21T12:51:11Z","timestamp":1650545471000},"page":"1-42","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":164,"title":["FPGA HLS Today: Successes, Challenges, and Opportunities"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2887-6963","authenticated-orcid":false,"given":"Jason","family":"Cong","sequence":"first","affiliation":[{"name":"University of California, Los Angeles, California"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0751-8227","authenticated-orcid":false,"given":"Jason","family":"Lau","sequence":"additional","affiliation":[{"name":"University of California, Los Angeles, California"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8538-686X","authenticated-orcid":false,"given":"Gai","family":"Liu","sequence":"additional","affiliation":[{"name":"Xilinx Inc., San Jose, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2956-8428","authenticated-orcid":false,"given":"Stephen","family":"Neuendorffer","sequence":"additional","affiliation":[{"name":"Xilinx Inc., San Jose, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2947-8991","authenticated-orcid":false,"given":"Peichen","family":"Pan","sequence":"additional","affiliation":[{"name":"Falcon Computing Solutions Inc., CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6249-315X","authenticated-orcid":false,"given":"Kees","family":"Vissers","sequence":"additional","affiliation":[{"name":"Xilinx Inc., San Jose, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0778-0308","authenticated-orcid":false,"given":"Zhiru","family":"Zhang","sequence":"additional","affiliation":[{"name":"Cornell University, Ithaca, NY"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2022,8,8]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2011.2110592"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2004.1281665"},{"key":"e_1_3_2_4_2","first-page":"433","article-title":"An efficient and versatile scheduling algorithm based on SDC formulation","author":"Cong Jason","year":"2006","unstructured":"Jason Cong and Zhiru Zhang. 2006. An efficient and versatile scheduling algorithm based on SDC formulation. In Proceedings of the Design Automation Conference (DAC\u201906).433\u2013438.","journal-title":"Proceedings of the Design Automation Conference (DAC\u201906)."},{"key":"e_1_3_2_5_2","first-page":"211","article-title":"SDC-based modulo scheduling for pipeline synthesis","author":"Zhang Zhiru","year":"2013","unstructured":"Zhiru Zhang and Bin Liu. 2013. SDC-based modulo scheduling for pipeline synthesis. In Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201913).211\u2013218.","journal-title":"Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201913)."},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/1929943.1929947"},{"key":"e_1_3_2_7_2","unstructured":"BDTI. n.d. BDTI Certified Results for the AutoESL AutoPilot High-Level Synthesis Tool. Retrieved July 27 2021 from https:\/\/www.bdti.com\/Resources\/BenchmarkResults\/HLSTCP\/AutoPilot."},{"key":"e_1_3_2_8_2","unstructured":"Xilinx. n.d. Vivado Design Suite User Guide: High-Level Synthesis UG902 (v2012.2). Retrieved July 28 2021 from https:\/\/www.xilinx.com\/support\/documentation\/sw_manuals\/xilinx2012_2\/ug902-vivado-high-level-synthesis.pdf."},{"key":"e_1_3_2_9_2","unstructured":"ChipEstimate.com. n.d. Xilinx Unveils Vivado Design Suite for the Next Decade of \u2018All Programmable\u2019 Devices. Retrieved July 28 2021 from https:\/\/www.chipestimate.com\/Xilinx-Unveils-Vivado-Design-Suite-for-the-Next-Decade-of-\/Xilinx\/Technical-Article\/2012\/06\/12."},{"key":"e_1_3_2_10_2","unstructured":"M. Sussmann and T. Hill. 2017. Intel HLS Compiler: Fast Design Coding and Hardware. https:\/\/www.altera.com\/content\/dam\/altera-www\/global\/en_US\/pdfs\/literature\/wp\/wp-01274-intel-hls-compiler-fast-design-coding-and-hardware.pdfasoftoday."},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689060"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847276"},{"key":"e_1_3_2_13_2","first-page":"152","article-title":"FP-DNN: An automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates","author":"Guan Yijin","year":"2017","unstructured":"Yijin Guan, Hao Liang, Ningyi Xu, Wenqiang Wang, Shaoshuai Shi, Xi Chen, Guangyu Sun, Wei Zhang, and Jason Cong. 2017. FP-DNN: An automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201917).152\u2013159.","journal-title":"Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201917)."},{"key":"e_1_3_2_14_2","first-page":"85","article-title":"Customizing neural networks for efficient FPGA implementation","author":"Samragh Mohammad","year":"2017","unstructured":"Mohammad Samragh, Mohammad Ghasemzadeh, and Farinaz Koushanfar. 2017. Customizing neural networks for efficient FPGA implementation. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201917).85\u201392.","journal-title":"Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201917)."},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021738"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021741"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021744"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3174243.3174253"},{"key":"e_1_3_2_19_2","first-page":"317","article-title":"A scalable OpenCL-based FPGA accelerator for YOLOv2","author":"Xu Ke","year":"2019","unstructured":"Ke Xu, Xiaoyun Wang, and Dong Wang. 2019. A scalable OpenCL-based FPGA accelerator for YOLOv2. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201919).317\u2013317.","journal-title":"Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201919)."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293902"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293915"},{"key":"e_1_3_2_22_2","first-page":"231","article-title":"Systolic-CNN: An OpenCL-defined scalable run-time-flexible FPGA accelerator architecture for accelerating convolutional neural network inference in cloud\/edge computing","author":"Dua Akshay","year":"2020","unstructured":"Akshay Dua, Yixing Li, and Fengbo Ren. 2020. Systolic-CNN: An OpenCL-defined scalable run-time-flexible FPGA accelerator architecture for accelerating convolutional neural network inference in cloud\/edge computing. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201920).231\u2013231.","journal-title":"Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201920)."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439296"},{"key":"e_1_3_2_24_2","first-page":"199","article-title":"A novel high-throughput acceleration engine for read alignment","author":"Chen Yu-Ting","year":"2015","unstructured":"Yu-Ting Chen, Jason Cong, Jie Lei, and Peng Wei. 2015. A novel high-throughput acceleration engine for read alignment. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201915).199\u2013202.","journal-title":"Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201915)."},{"key":"e_1_3_2_25_2","first-page":"137","article-title":"Acceleration of the Pair-HMM algorithm for DNA variant calling","author":"Manikandan Gowthami Jayashri","year":"2016","unstructured":"Gowthami Jayashri Manikandan, Sitao Huang, Kyle Rupnow, Wen-Mei W. Hwu, and Deming Chen. 2016. Acceleration of the Pair-HMM algorithm for DNA variant calling. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201916).137\u2013137.","journal-title":"Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201916)."},{"key":"e_1_3_2_26_2","article-title":"Hardware acceleration of long read pairwise overlapping in genome sequencing: A race between FPGA and GPU","author":"Guo L.","year":"2019","unstructured":"L. Guo, J. Lau, Z. Ruan, P. Wei, and J. Cong. 2019. Hardware acceleration of long read pairwise overlapping in genome sequencing: A race between FPGA and GPU. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201919).","journal-title":"Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201919)"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM48280.2020.00029"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373087.3375300"},{"key":"e_1_3_2_29_2","article-title":"SMEM++: A pipelined and time-multiplexed SMEM seeding accelerator for DNA sequencing","author":"Cong Jason","year":"2018","unstructured":"Jason Cong, Licheng Guo, Po-Tsang Huang, Peng Wei, and Tianhe Yu. 2018. SMEM++: A pipelined and time-multiplexed SMEM seeding accelerator for DNA sequencing. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201918).","journal-title":"Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201918)."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847268"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439298"},{"key":"e_1_3_2_32_2","first-page":"1","article-title":"A scalable, high-performance customized priority queue","author":"Huang Muhuan","year":"2014","unstructured":"Muhuan Huang, Kevin Lim, and Jason Cong. 2014. A scalable, high-performance customized priority queue. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201914).1\u20134.","journal-title":"Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201914)."},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM51124.2021.00020"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439290"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439291"},{"key":"e_1_3_2_36_2","first-page":"157","article-title":"FPGA implementation of EM algorithm for 3D CT reconstruction","author":"Choi Young Kyu","year":"2014","unstructured":"Young Kyu Choi, Jason Cong, and Di Wu. 2014. FPGA implementation of EM algorithm for 3D CT reconstruction. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201914).157\u2013160.","journal-title":"Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201914)."},{"key":"e_1_3_2_37_2","first-page":"47","article-title":"FFShark: A 100G FPGA implementation of BPF filtering for Wireshark","author":"Vega Juan Camilo","year":"2020","unstructured":"Juan Camilo Vega, Marco Antonio Merlini, and Paul Chow. 2020. FFShark: A 100G FPGA implementation of BPF filtering for Wireshark. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201920).47\u201355.","journal-title":"Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201920)."},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021730"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3174243.3174248"},{"key":"e_1_3_2_40_2","first-page":"337","article-title":"Large-scale and high-throughput QR decomposition on an FPGA","author":"Lee Dajung","year":"2019","unstructured":"Dajung Lee, Andrei Hagiescu, and Dan Pritsker. 2019. Large-scale and high-throughput QR decomposition on an FPGA. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201919).337\u2013337.","journal-title":"Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201919)."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373087.3375298"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439292"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3490422.3502357"},{"key":"e_1_3_2_44_2","first-page":"56","article-title":"Hardware architecture of a number theoretic transform for a bootstrappable RNS-based homomorphic encryption scheme","author":"Kim Sunwoong","year":"2020","unstructured":"Sunwoong Kim, Keewoo Lee, Wonhee Cho, Yujin Nam, Jung Hee Cheon, and Rob A. Rutenbar. 2020. Hardware architecture of a number theoretic transform for a bootstrappable RNS-based homomorphic encryption scheme. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201920).56\u201364.","journal-title":"Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201920)."},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847274"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2015.2513673"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3024098"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2020.3039409"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3469660"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2012.2221191"},{"key":"e_1_3_2_51_2","unstructured":"Xilinx. n.d. NGCodec Hardware HEVC Encoding (UG1408). Retrieved April 29 2022 from https:\/\/www.xilinx.com\/publications\/user-guide\/partner\/ug1408-ngcodec-hevc.pdf."},{"key":"e_1_3_2_52_2","first-page":"1","article-title":"In-datacenter performance analysis of a tensor processing unit","author":"Jouppi Norman P.","year":"2017","unstructured":"Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, et\u00a0al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the International Symposium on Computer Architecture (ISCA\u201917).1\u201312.","journal-title":"Proceedings of the International Symposium on Computer Architecture (ISCA\u201917)."},{"key":"e_1_3_2_53_2","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1145\/3020078.3021741","article-title":"Accelerating binarized convolutional neural networks with software-programmable FPGAs","author":"Zhao Ritchie","year":"2017","unstructured":"Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Accelerating binarized convolutional neural networks with software-programmable FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201917).15\u201324.","journal-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201917)."},{"key":"e_1_3_2_54_2","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1145\/3289602.3293902","article-title":"Synetgy: Algorithm-hardware co-design for convnet accelerators on embedded FPGAs","author":"Yang Yifan","year":"2019","unstructured":"Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, John Wawrzynek, and Kurt Keutzer. 2019. Synetgy: Algorithm-hardware co-design for convnet accelerators on embedded FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201919).23\u201332.","journal-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201919)."},{"key":"e_1_3_2_55_2","first-page":"1","article-title":"FPGA\/DNN co-design: An efficient design methodology for IoT intelligence on the edge","author":"Hao Cong","year":"2019","unstructured":"Cong Hao, Xiaofan Zhang, Yuhong Li, Sitao Huang, Jinjun Xiong, Kyle Rupnow, Wen-Mei Hwu, and Deming Chen. 2019. FPGA\/DNN co-design: An efficient design methodology for IoT intelligence on the edge. In Proceedings of the Design Automation Conference (DAC\u201919).1\u20136.","journal-title":"Proceedings of the Design Automation Conference (DAC\u201919)."},{"key":"e_1_3_2_56_2","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1145\/2847263.2847276","article-title":"Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks","author":"Suda Naveen","year":"2016","unstructured":"Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201916).16\u201325.","journal-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201916)."},{"key":"e_1_3_2_57_2","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1145\/3020078.3021698","article-title":"Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network","author":"Zhang Jialiang","year":"2017","unstructured":"Jialiang Zhang and Jing Li. 2017. Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201917).25\u201334.","journal-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201917)."},{"key":"e_1_3_2_58_2","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1145\/3020078.3021738","article-title":"An OpenCL deep learning accelerator on Arria 10","author":"Aydonat Utku","year":"2017","unstructured":"Utku Aydonat, Shane O\u2019Connell, Davor Capalija, Andrew C. Ling, and Gordon R. Chiu. 2017. An OpenCL deep learning accelerator on Arria 10. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201917).55\u201364.","journal-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201917)."},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3242897"},{"key":"e_1_3_2_60_2","unstructured":"GitHub. n.d. Brevitas: A PyTorch Library for Quantization-Aware Training. Retrieved April 29 2022 from https:\/\/github.com\/Xilinx\/brevitas."},{"key":"e_1_3_2_61_2","unstructured":"GitHub. n.d. FINN Code Example. Retrieved April 29 2022 from https:\/\/github.com\/Xilinx\/finn-hlslib\/blob\/vitis-hls\/mvau.hpp#L147-L179."},{"key":"e_1_3_2_62_2","unstructured":"Fast Machine Learning Lab. n.d. hls4ml. Retrieved April 29 2022 from https:\/\/fastmachinelearning.org\/hls4ml\/reference.html."},{"key":"e_1_3_2_63_2","article-title":"Deep residual learning for image recognition","author":"He Kaiming","year":"2016","unstructured":"Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR\u201916).","journal-title":"Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR\u201916)."},{"key":"e_1_3_2_64_2","article-title":"ShuffleNet: An extremely efficient convolutional neural network for mobile devices","author":"Zhang Xiangyu","year":"2018","unstructured":"Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR\u201918).","journal-title":"Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR\u201918)."},{"key":"e_1_3_2_65_2","article-title":"MobileNetV2: Inverted residuals and linear bottlenecks","author":"Sandler Mark","year":"2018","unstructured":"Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR\u201918).","journal-title":"Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR\u201918)."},{"key":"e_1_3_2_66_2","article-title":"ReActNet: Towards precise binary neural network with generalized activation functions","author":"Liu Zechun","year":"2020","unstructured":"Zechun Liu, Zhiqiang Shen, Marios Savvides, and Kwang-Ting Cheng. 2020. ReActNet: Towards precise binary neural network with generalized activation functions. In Proceedings of the European Conference on Computer Vision.","journal-title":"Proceedings of the European Conference on Computer Vision."},{"key":"e_1_3_2_67_2","article-title":"Serving DNNs in real time at datacenter scale with project brainwave","author":"Chung Eric","year":"2018","unstructured":"Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengill, Ming Liu, et\u00a0al. 2018. Serving DNNs in real time at datacenter scale with project brainwave. In Proceedings of the International Symposium on Microarchitecture (MICRO\u201918).","journal-title":"Proceedings of the International Symposium on Microarchitecture (MICRO\u201918)."},{"key":"e_1_3_2_68_2","article-title":"DLA: Compiler and FPGA overlay for neural network inference acceleration","author":"Abdelfattah Mohamed S.","year":"2018","unstructured":"Mohamed S. Abdelfattah, David Han, Andrew Bitar, Roberto DiCecco, Shane O\u2019Connell, Nitika Shanker, Joseph Chu, et\u00a0al. 2018. DLA: Compiler and FPGA overlay for neural network inference acceleration. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201918).","journal-title":"Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201918)."},{"key":"e_1_3_2_69_2","first-page":"1","article-title":"Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs","author":"Wei Xuechao","year":"2017","unstructured":"Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Proceedings of the Design Automation Conference (DAC\u201917).1\u20136.","journal-title":"Proceedings of the Design Automation Conference (DAC\u201917)."},{"key":"e_1_3_2_70_2","first-page":"1","article-title":"PolySA: Polyhedral-based systolic array auto-compilation","author":"Cong Jason","year":"2018","unstructured":"Jason Cong and Jie Wang. 2018. PolySA: Polyhedral-based systolic array auto-compilation. In Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201918).1\u20138.","journal-title":"Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201918)."},{"key":"e_1_3_2_71_2","first-page":"181","article-title":"T2S-Tensor: Productively generating high-performance spatial hardware for dense tensor computations","author":"Srivastava Nitish","year":"2019","unstructured":"Nitish Srivastava, Hongbo Rong, Prithayan Barua, Guanyu Feng, Huanqi Cao, Zhiru Zhang, David Albonesi, et\u00a0al. 2019. T2S-Tensor: Productively generating high-performance spatial hardware for dense tensor computations. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201919).181\u2013189.","journal-title":"Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201919)."},{"key":"e_1_3_2_72_2","first-page":"1","article-title":"SuSy: A programming model for productive construction of high-performance systolic arrays on FPGAs","author":"Lai Yi-Hsiang","year":"2020","unstructured":"Yi-Hsiang Lai, Hongbo Rong, Size Zheng, Weihao Zhang, Xiuping Cui, Yunshan Jia, Jie Wang, et\u00a0al. 2020. SuSy: A programming model for productive construction of high-performance systolic arrays on FPGAs. In Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201920).1\u20139.","journal-title":"Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201920)."},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2017.2785257"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1145\/2966986.2967011"},{"key":"e_1_3_2_75_2","first-page":"1","article-title":"DNNBuilder: An automated tool for building high-performance DNN hardware accelerators for FPGAs","author":"Zhang Xiaofan","year":"2018","unstructured":"Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-Mei Hwu, and Deming Chen. 2018. DNNBuilder: An automated tool for building high-performance DNN hardware accelerators for FPGAs. In Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201918).1\u20138.","journal-title":"Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201918)."},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373087.3375321"},{"key":"e_1_3_2_77_2","first-page":"129","article-title":"Fast and accurate estimation of quality of results in high-level synthesis with machine learning","author":"Dai Steve","year":"2018","unstructured":"Steve Dai, Yuan Zhou, Hang Zhang, Ecenur Ustun, Evangeline F. Y. Young, and Zhiru Zhang. 2018. Fast and accurate estimation of quality of results in high-level synthesis with machine learning. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201918).129\u2013132.","journal-title":"Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201918)."},{"key":"e_1_3_2_78_2","unstructured":"Xilinx. n.d. Versal ACAP AI Engine Programming Environment UG1076 (v2021.1). Retrieved April 29 2022 from https:\/\/www.xilinx.com\/support\/documentation\/sw_manuals\/xilinx2021_1\/ug1076-ai-engine-environment.pdf."},{"key":"e_1_3_2_79_2","unstructured":"Xilinx. 2019. Virtex UltraScale+ HBM FPGA: A Revolutionary Increase in Memory Performance. Retrieved April 29 2022 from https:\/\/www.xilinx.com\/support\/documentation\/white_papers\/wp485-hbm.pdf."},{"key":"e_1_3_2_80_2","first-page":"105","article-title":"FPGP: Graph processing framework on FPGA\u2014A case study of breadth-first search","author":"Dai Guohao","year":"2016","unstructured":"Guohao Dai, Yuze Chi, Yu Wang, and Huazhong Yang. 2016. FPGP: Graph processing framework on FPGA\u2014A case study of breadth-first search. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201916).105\u2013110.","journal-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201916)."},{"key":"e_1_3_2_81_2","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1145\/3174243.3174245","article-title":"Degree-aware hybrid graph traversal on FPGA-HMC platform","author":"Zhang Jialiang","year":"2018","unstructured":"Jialiang Zhang and Jing Li. 2018. Degree-aware hybrid graph traversal on FPGA-HMC platform. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201918).229\u2013238.","journal-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201918)."},{"key":"e_1_3_2_82_2","article-title":"Optimizing memory performance for FPGA implementation of PageRank","author":"Zhou Shijie","year":"2015","unstructured":"Shijie Zhou, Charalampos Chelmis, and Viktor K. Prasanna. 2015. Optimizing memory performance for FPGA implementation of PageRank. In Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig\u201915).","journal-title":"Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig\u201915)."},{"key":"e_1_3_2_83_2","first-page":"378","article-title":"A reduced-precision streaming SpMV architecture for personalized PageRank on FPGA","author":"Parravicini Alberto","year":"2021","unstructured":"Alberto Parravicini, Francesco Sgherzi, and Marco D. Santambrogio. 2021. A reduced-precision streaming SpMV architecture for personalized PageRank on FPGA. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC\u201921).378\u2013383.","journal-title":"Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC\u201921)."},{"key":"e_1_3_2_84_2","doi-asserted-by":"crossref","unstructured":"Shijie Zhou Charalampos Chelmis and Viktor K. Prasanna. 2015. Accelerating large-scale single-source shortest path on FPGA. In Proceedings of the International Parallel and Distributed Processing Symposium Workshop (IPDPS\u201914) . 129\u2013136.","DOI":"10.1109\/IPDPSW.2015.130"},{"key":"e_1_3_2_85_2","article-title":"Accelerating SSSP for power-law graphs","author":"Chi Yuze","year":"2022","unstructured":"Yuze Chi, Licheng Guo, and Jason Cong. 2022. Accelerating SSSP for power-law graphs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201922).","journal-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201922)."},{"key":"e_1_3_2_86_2","article-title":"ENIAD: A reconfigurable near-data processing architecture for web-scale AI-enriched big data service","author":"Zhang Jialiang","year":"2021","unstructured":"Jialiang Zhang and Jing Li. 2021. ENIAD: A reconfigurable near-data processing architecture for web-scale AI-enriched big data service. In Proceedings of the Hot Chips Symposium.","journal-title":"Proceedings of the Hot Chips Symposium."},{"key":"e_1_3_2_87_2","article-title":"GraphGen: An FPGA framework for vertex-centric graph computation","author":"Nurvitadhi Eriko","year":"2014","unstructured":"Eriko Nurvitadhi, Gabriel Weisz, Yu Wang, Skand Hurkat, Marie Nguyen, James C. Hoe, Jos\u00e9 F Mart\u00ednez, and Carlos Guestrin. 2014. GraphGen: An FPGA framework for vertex-centric graph computation. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201914).","journal-title":"Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201914)."},{"key":"e_1_3_2_88_2","doi-asserted-by":"crossref","DOI":"10.1145\/2847263.2847337","article-title":"GraphOps: A dataflow library for graph analytics acceleration","author":"Oguntebi Tayo","year":"2016","unstructured":"Tayo Oguntebi and Kunle Olukotun. 2016. GraphOps: A dataflow library for graph analytics acceleration. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201916).","journal-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201916)."},{"key":"e_1_3_2_89_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2019.2910068"},{"key":"e_1_3_2_90_2","first-page":"17","article-title":"PowerGraph: Distributed graph-parallel computation on natural graphs","author":"Gonzalez Joseph E.","year":"2012","unstructured":"Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201912).17\u201330.","journal-title":"Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201912)."},{"key":"e_1_3_2_91_2","article-title":"GraphLily: Accelerating graph linear algebra on HBM-equipped FPGAs","author":"Hu Yuwei","year":"2021","unstructured":"Yuwei Hu, Yixiao Du, Ecenur Ustun, and Zhiru Zhang. 2021. GraphLily: Accelerating graph linear algebra on HBM-equipped FPGAs. In Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201921).","journal-title":"Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201921)."},{"key":"e_1_3_2_92_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2016.7761646"},{"key":"e_1_3_2_93_2","doi-asserted-by":"crossref","DOI":"10.2172\/7093021","volume-title":"ITPACKV 2D User\u2019s Guide","author":"Kincaid David R.","year":"1989","unstructured":"David R. Kincaid, Thomas C. Oppe, and David M. Young. 1989. ITPACKV 2D User\u2019s Guide. Technical Report. Center for Numerical Analysis, Texas University, Austin."},{"key":"e_1_3_2_94_2","unstructured":"GitHub. n.d. Xilinx Runtime Library (XRT). Retrieved August 20 2021 from https:\/\/github.com\/Xilinx\/XRT."},{"key":"e_1_3_2_95_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18072.2020.9218718"},{"key":"e_1_3_2_96_2","first-page":"215","volume-title":"Genetic Variation","author":"Ng Pauline C.","year":"2010","unstructured":"Pauline C. Ng and Ewen F. Kirkness. 2010. Whole genome sequencing. In Genetic Variation. Springer, 215\u2013226."},{"key":"e_1_3_2_97_2","doi-asserted-by":"publisher","DOI":"10.1093\/bfgp\/elr035"},{"key":"e_1_3_2_98_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ygeno.2010.03.001"},{"key":"e_1_3_2_99_2","doi-asserted-by":"publisher","DOI":"10.1109\/MDAT.2013.2293757"},{"key":"e_1_3_2_100_2","doi-asserted-by":"publisher","DOI":"10.1109\/CONECCT.2018.8482378"},{"key":"e_1_3_2_101_2","first-page":"388","article-title":"Accelerated seeding for genome sequence alignment with enumerated radix trees","author":"Subramaniyan Arun","year":"2021","unstructured":"Arun Subramaniyan, Jack Wadden, Kush Goliya, Nathan Ozog, Xiao Wu, Satish Narayanasamy, David Blaauw, and Reetuparna Das. 2021. Accelerated seeding for genome sequence alignment with enumerated radix trees. In Proceedings of the International Symposium on Computer Architecture (ISCA\u201921).388\u2013401.","journal-title":"Proceedings of the International Symposium on Computer Architecture (ISCA\u201921)."},{"key":"e_1_3_2_102_2","volume-title":"Accelerating Genomics Research with OpenCL and FPGAs","author":"Rauer Chris","year":"2017","unstructured":"Chris Rauer, George S. Powley, Mir Ahsan, and Nicholas Finamore Jr. 2017. Accelerating Genomics Research with OpenCL and FPGAs. Technical Report. Intel Corporation."},{"key":"e_1_3_2_103_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00031"},{"key":"e_1_3_2_104_2","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1109\/FCCM.2012.36","article-title":"Hardware acceleration of short read mapping","author":"Olson Corey B.","year":"2012","unstructured":"Corey B. Olson, Maria Kim, Cooper Clauson, Boris Kogon, Carl Ebeling, Scott Hauck, and Walter L. Ruzzo. 2012. Hardware acceleration of short read mapping. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201912).161\u2013168.","journal-title":"Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201912)."},{"key":"e_1_3_2_105_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2013.57"},{"key":"e_1_3_2_106_2","first-page":"1","article-title":"Hardware accelerated novel optical de novo assembly for large-scale genomes","author":"Meng Pingfan","year":"2014","unstructured":"Pingfan Meng, Matthew Jacobsen, Motoki Kimura, Vladimir Dergachev, Thomas Anantharaman, Michael Requa, and Ryan Kastner. 2014. Hardware accelerated novel optical de novo assembly for large-scale genomes. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201914).1\u20138.","journal-title":"Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201914)."},{"key":"e_1_3_2_107_2","first-page":"277","article-title":"FPGA accelerated INDEL realignment in the cloud","author":"Wu Lisa","year":"2019","unstructured":"Lisa Wu, David Bruns-Smith, Frank A. Nothaft, Qijing Huang, Sagar Karandikar, Johnny Le, Andrew Lin, et\u00a0al. 2019. FPGA accelerated INDEL realignment in the cloud. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA\u201919).277\u2013290.","journal-title":"Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA\u201919)."},{"key":"e_1_3_2_108_2","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1145\/3020078.3021787","article-title":"CPU-FPGA co-optimization for big data applications: A case study of in-memory Samtool sorting","author":"Cong Jason","year":"2017","unstructured":"Jason Cong, Zhenman Fang, Muhuan Huang, Libo Wang, and Di Wu. 2017. CPU-FPGA co-optimization for big data applications: A case study of in-memory Samtool sorting. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201917).291\u2013291.","journal-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201917)."},{"key":"e_1_3_2_109_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bty191"},{"key":"e_1_3_2_110_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-13-6037-4_8"},{"key":"e_1_3_2_111_2","doi-asserted-by":"publisher","DOI":"10.1186\/s12864-019-5475-x"},{"issue":"18","key":"e_1_3_2_112_2","first-page":"8","article-title":"Nvidia CUDA C programming guide","volume":"120","year":"2011","unstructured":"Nvidia. 2011. Nvidia CUDA C programming guide. Nvidia Corporation 120, 18 (2011), 8.","journal-title":"Nvidia Corporation"},{"key":"e_1_3_2_113_2","volume-title":"Caenorhabditis Elegans 40x Coverage Dataset","author":"Biosciences Pacific","year":"2014","unstructured":"Pacific Biosciences. 2014. Caenorhabditis Elegans 40x Coverage Dataset. Retrieved April 29, 2022 from http:\/\/datasets.pacb.com.s3.amazonaws.com\/2014\/c_elegans\/list.html."},{"key":"e_1_3_2_114_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2009.20"},{"key":"e_1_3_2_115_2","article-title":"Best-effort FPGA programming: A few steps can go a long way","author":"Cong Jason","year":"2018","unstructured":"Jason Cong, Zhenman Fang, Yuchen Hao, Peng Wei, Cody Hao Yu, Chen Zhang, and Peipei Zhou. 2018. Best-effort FPGA programming: A few steps can go a long way. arXiv preprint arXiv:1807.01340 (2018).","journal-title":"arXiv preprint arXiv:1807.01340"},{"key":"e_1_3_2_116_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-26408-0_8"},{"key":"e_1_3_2_117_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439289"},{"key":"e_1_3_2_118_2","unstructured":"Xilinx. 2020. Xilinx UltraScale Plus Architecture. Retrieved April 29 2022 from https:\/\/www.xilinx.com\/products\/silicon-devices\/fpga\/virtex-ultrascale-plus.html."},{"key":"e_1_3_2_119_2","unstructured":"Intel. 2020. Intel Stratix 10 FPGA. Retrieved April 29 2022 from https:\/\/www.intel.com\/content\/dam\/www\/programmable\/us\/en\/pdfs\/literature\/hb\/stratix-10\/s10-overview.pdf."},{"key":"e_1_3_2_120_2","first-page":"1","article-title":"SODA: Stencil with optimized dataflow architecture","author":"Chi Yuze","year":"2018","unstructured":"Yuze Chi, Jason Cong, Peng Wei, and Peipei Zhou. 2018. SODA: Stencil with optimized dataflow architecture. In Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201918).1\u20138.","journal-title":"Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201918)."},{"key":"e_1_3_2_121_2","first-page":"493","article-title":"HeteroRefactor: Refactoring for heterogeneous computing with FPGA","author":"Lau Jason","year":"2020","unstructured":"Jason Lau, Aishwarya Sivaraman, Qian Zhang, Muhammad Ali Gulzar, Jason Cong, and Miryung Kim. 2020. HeteroRefactor: Refactoring for heterogeneous computing with FPGA. In Proceedings of the International Conference on Software Engineering (ICSE\u201920).493\u2013505.","journal-title":"Proceedings of the International Conference on Software Engineering (ICSE\u201920)."},{"key":"e_1_3_2_122_2","doi-asserted-by":"publisher","DOI":"10.1145\/2554688.2554780"},{"key":"e_1_3_2_123_2","first-page":"1","article-title":"S2FA: An accelerator automation framework for heterogeneous computing in datacenters","author":"Yu Cody Hao","year":"2018","unstructured":"Cody Hao Yu, Peng Wei, Max Grossman, Peng Zhang, Vivek Sarker, and Jason Cong. 2018. S2FA: An accelerator automation framework for heterogeneous computing in datacenters. In Proceedings of the Design Automation Conference (DAC\u201918).1\u20136.","journal-title":"Proceedings of the Design Automation Conference (DAC\u201918)."},{"key":"e_1_3_2_124_2","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1145\/3289602.3293910","article-title":"HeteroCL: A multi-paradigm programming infrastructure for software-defined reconfigurable computing","author":"Lai Yi-Hsiang","year":"2019","unstructured":"Yi-Hsiang Lai, Yuze Chi, Yuwei Hu, Jie Wang, Cody Hao Yu, Yuan Zhou, Jason Cong, and Zhiru Zhang. 2019. HeteroCL: A multi-paradigm programming infrastructure for software-defined reconfigurable computing. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201919).242\u2013251.","journal-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201919)."},{"key":"e_1_3_2_125_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439478"},{"key":"e_1_3_2_126_2","article-title":"AutoDSE: Enabling software programmers design efficient FPGA accelerators","author":"Sohrabizadeh Atefeh","year":"2020","unstructured":"Atefeh Sohrabizadeh, Cody Hao Yu, Min Gao, and Jason Cong. 2020. AutoDSE: Enabling software programmers design efficient FPGA accelerators. arXiv preprint arXiv:2009.14381 (2020).","journal-title":"arXiv preprint arXiv:2009.14381"},{"key":"e_1_3_2_127_2","unstructured":"GitHub. n.d. The Merlin Compiler. Retrieved August 30 2021 from https:\/\/github.com\/Xilinx\/merlin-compiler.git."},{"key":"e_1_3_2_128_2","doi-asserted-by":"publisher","DOI":"10.1145\/2435264.2435273"},{"key":"e_1_3_2_129_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL50879.2020.00044"},{"key":"e_1_3_2_130_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2019.00057"},{"key":"e_1_3_2_131_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2016.26"},{"key":"e_1_3_2_132_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2018.2857040"},{"key":"e_1_3_2_133_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP.2016.7760777"},{"key":"e_1_3_2_134_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC.2005.193931"},{"key":"e_1_3_2_135_2","doi-asserted-by":"publisher","DOI":"10.1145\/358438.349317"},{"key":"e_1_3_2_136_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2017.2720623"},{"key":"e_1_3_2_137_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.scico.2007.01.015"},{"key":"e_1_3_2_138_2","volume-title":"ROSE Compiler Framework","author":"Quinlan Dan","year":"2019","unstructured":"Dan Quinlan, Markus Schordan, Rob Mazke, Pei-Hung Lin, Jim Leek, Justin Too, Chuhua Liao, et\u00a0al. 2019. ROSE Compiler Framework. Technical Report. Lawrence Livermore National Lab (LLNL), Livermore, CA."},{"key":"e_1_3_2_139_2","first-page":"227","article-title":"Templatised soft floating-point for high-level synthesis","author":"Thomas David B.","year":"2019","unstructured":"David B. Thomas. 2019. Templatised soft floating-point for high-level synthesis. In Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201919).227\u2013235.","journal-title":"Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM\u201919)."},{"key":"e_1_3_2_140_2","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1145\/3174243.3174255","article-title":"Rosetta: A realistic high-level synthesis benchmark suite for software programmable FPGAs","author":"Zhou Yuan","year":"2018","unstructured":"Yuan Zhou, Udit Gupta, Steve Dai, Ritchie Zhao, Nitish Srivastava, Hanchen Jin, Joseph Featherston, et\u00a0al. 2018. Rosetta: A realistic high-level synthesis benchmark suite for software programmable FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201918).269\u2013278.","journal-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201918)."},{"key":"e_1_3_2_141_2","doi-asserted-by":"publisher","DOI":"10.1109\/HOTCHIPS.2019.8875639"},{"key":"e_1_3_2_142_2","doi-asserted-by":"publisher","DOI":"10.1109\/CASES.2013.6662524"},{"key":"e_1_3_2_143_2","doi-asserted-by":"publisher","DOI":"10.1145\/2514740"},{"key":"e_1_3_2_144_2","unstructured":"PANDA. n.d. The PandA\/Bambu Project. Retrieved July 28 2021 from https:\/\/panda.dei.polimi.it\/."},{"key":"e_1_3_2_145_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCAS.2021.3071631"},{"key":"e_1_3_2_146_2","unstructured":"GitHub. n.d. Xilinx Vitis HLS LLVM 2020.2. Retrieved August 12 2021 from https:\/\/github.com\/Xilinx\/HLS."},{"key":"e_1_3_2_147_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO51591.2021.9370308"},{"key":"e_1_3_2_148_2","volume-title":"Proceedings of the Workshop on Languages, Tools, and Techniques for Accelerator Design (LATTE\u201921)","author":"Ye Hanchen","year":"2021","unstructured":"Hanchen Ye, Cong Hao, Hyunmin Jeong, Jack Huang, and Deming Chen. 2021. ScaleHLS: Achieving scalable high-level synthesis through MLIR. In Proceedings of the Workshop on Languages, Tools, and Techniques for Accelerator Design (LATTE\u201921)."},{"key":"e_1_3_2_149_2","article-title":"ScaleHLS: Scalable high-level synthesis through MLIR","author":"Ye Hanchen","year":"2021","unstructured":"Hanchen Ye, Cong Hao, Jianyi Cheng, Hyunmin Jeong, Jack Huang, Stephen Neuendorffer, and Deming Chen. 2021. ScaleHLS: Scalable high-level synthesis through MLIR. arXiv preprint arXiv:2107.11673 (2021).","journal-title":"arXiv preprint arXiv:2107.11673"},{"key":"e_1_3_2_150_2","doi-asserted-by":"publisher","DOI":"10.1145\/3107953"},{"key":"e_1_3_2_151_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373087.3375320"},{"key":"e_1_3_2_152_2","article-title":"Halide and GENESIS for generating domain-specific architecture of guided image filtering","author":"Ishikawa Akari","year":"2019","unstructured":"Akari Ishikawa, Norishige Fukushima, Akira Maruoka, and Takuro Iizuka. 2019. Halide and GENESIS for generating domain-specific architecture of guided image filtering. In Proceedings of the International Symposium on Circuits and Systems (ISCAS\u201919).","journal-title":"Proceedings of the International Symposium on Circuits and Systems (ISCAS\u201919)."},{"key":"e_1_3_2_153_2","doi-asserted-by":"crossref","DOI":"10.1145\/2491956.2462176","article-title":"Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines","author":"Ragan-Kelley Jonathan","year":"2013","unstructured":"Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Fr\u00e9do Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201913).","journal-title":"Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201913)."},{"key":"e_1_3_2_154_2","doi-asserted-by":"publisher","DOI":"10.1145\/2601097.2601174"},{"key":"e_1_3_2_155_2","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925892"},{"key":"e_1_3_2_156_2","article-title":"Generating FPGA-based image processing accelerators with Hipacc","author":"Reiche Oliver","year":"2017","unstructured":"Oliver Reiche, M. Akif \u00d6zkan, Richard Membarth, J\u00fcrgen Teich, and Frank Hannig. 2017. Generating FPGA-based image processing accelerators with Hipacc. In Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201917).","journal-title":"Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201917)."},{"key":"e_1_3_2_157_2","article-title":"TVM: An automated end-to-end optimizing compiler for deep learning","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, et\u00a0al. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918).","journal-title":"Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918)."},{"key":"e_1_3_2_158_2","article-title":"VTA: An open hardware-software stack for deep learning","author":"Moreau Thierry","year":"2018","unstructured":"Thierry Moreau, Tianqi Chen, Ziheng Jiang, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. VTA: An open hardware-software stack for deep learning. arXiv preprint arXiv:1807.04188 (2018).","journal-title":"arXiv preprint arXiv:1807.04188"},{"key":"e_1_3_2_159_2","article-title":"OptiML: An implicitly parallel domain-specific language for machine learning","author":"Sujeeth Arvind K.","year":"2011","unstructured":"Arvind K. Sujeeth, HyoukJoong Lee, Kevin J. Brown, Tiark Rompf, Hassan Chafi, Michael Wu, Anand R. Atreya, Martin Odersky, and Kunle Olukotun. 2011. OptiML: An implicitly parallel domain-specific language for machine learning. In Proceedings of the International Conference on Machine Learning (ICML\u201911).","journal-title":"Proceedings of the International Conference on Machine Learning (ICML\u201911)."},{"key":"e_1_3_2_160_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2011.68"},{"key":"e_1_3_2_161_2","doi-asserted-by":"crossref","DOI":"10.1145\/3490422.3502369","article-title":"HeteroFlow: An accelerator programming model with decoupled data placement for software-defined FPGAs","author":"Xiang Shaojie","year":"2022","unstructured":"Shaojie Xiang, Yi-Hsiang Lai, Yuan Zhou, Hongzheng Chen, Niansong Zhang, Debjit Pal, and Zhiru Zhang. 2022. HeteroFlow: An accelerator programming model with decoupled data placement for software-defined FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201922).","journal-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA\u201922)."},{"key":"e_1_3_2_162_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM51124.2021.00032"},{"key":"e_1_3_2_163_2","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446712"},{"key":"e_1_3_2_164_2","doi-asserted-by":"publisher","DOI":"10.1109\/IEEESTD.1996.81542"},{"key":"e_1_3_2_165_2","doi-asserted-by":"publisher","DOI":"10.1109\/IEEESTD.1994.121433"},{"key":"e_1_3_2_166_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3024098"},{"key":"e_1_3_2_167_2","article-title":"hlslib: Software engineering for hardware design","volume":"1910","author":"Licht Johannes de Fine","year":"2019","unstructured":"Johannes de Fine Licht and Torsten Hoefler. 2019. hlslib: Software engineering for hardware design. CoRR abs\/1910.04436 (2019). http:\/\/arxiv.org\/abs\/1910.04436.","journal-title":"CoRR"},{"key":"e_1_3_2_168_2","unstructured":"Khronos. 2021. SYCL 2020 Specification Revision 3. Retrieved April 29 2022 from https:\/\/www.khronos.org\/registry\/SYCL\/specs\/sycl-2020\/pdf\/sycl-2020.pdf."},{"key":"e_1_3_2_169_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4842-5574-2"},{"key":"e_1_3_2_170_2","first-page":"69","volume-title":"Proceedings of the International Conference on Formal Methods and Models for Co-Design (MEMOCODE\u201904)","author":"Nikhil Rishiyur","year":"2004","unstructured":"Rishiyur Nikhil. 2004. Bluespec System Verilog: Efficient, correct RTL from high level specifications. In Proceedings of the International Conference on Formal Methods and Models for Co-Design (MEMOCODE\u201904). IEEE, Los Alamitos, CA, 69\u201370."},{"key":"e_1_3_2_171_2","first-page":"1212","article-title":"Chisel: Constructing hardware in a Scala embedded language","author":"Bachrach Jonathan","year":"2012","unstructured":"Jonathan Bachrach, Huy Vo, Brian Richards, Yunsup Lee, Andrew Waterman, Rimas Avi\u017eienis, John Wawrzynek, and Krste Asanovi\u0107. 2012. Chisel: Constructing hardware in a Scala embedded language. In Proceedings of the Design Automation Conference (DAC\u201912).1212\u20131221.","journal-title":"Proceedings of the Design Automation Conference (DAC\u201912)."},{"key":"e_1_3_2_172_2","article-title":"A modular digital VLSI flow for high-productivity SoC design","author":"Khailany Brucek","year":"2018","unstructured":"Brucek Khailany, Evgeni Krimer, Rangharajan Venkatesan, Jason Clemons, Joel S. Emer, Matthew Fojtik, Alicia Klinefelter, et\u00a0al. 2018. A modular digital VLSI flow for high-productivity SoC design. In Proceedings of the Design Automation Conference (DAC\u201918).","journal-title":"Proceedings of the Design Automation Conference (DAC\u201918)."},{"key":"e_1_3_2_173_2","article-title":"What you simulate is what you synthesize: Designing a processor core from C++ specifications","author":"Rokicki Simon","year":"2019","unstructured":"Simon Rokicki, Davide Pala, Joseph Paturel, and Olivier Sentieys. 2019. What you simulate is what you synthesize: Designing a processor core from C++ specifications. In Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201919).","journal-title":"Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201919)."},{"key":"e_1_3_2_174_2","article-title":"VTune Performance Analyzer Essentials","author":"Reinders James","year":"2005","unstructured":"James Reinders. 2005. VTune Performance Analyzer Essentials. Intel Press.","journal-title":"Intel Press."},{"key":"e_1_3_2_175_2","article-title":"GPU Performance Analysis and Optimisation","author":"Bradley Thomas","year":"2012","unstructured":"Thomas Bradley. 2012. GPU Performance Analysis and Optimisation. NVIDIA Corporation.","journal-title":"NVIDIA Corporation."},{"key":"e_1_3_2_176_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD.2017.8203844"},{"key":"e_1_3_2_177_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2017.44"},{"key":"e_1_3_2_178_2","doi-asserted-by":"publisher","DOI":"10.1145\/3490422.3502361"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3530775","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3530775","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:25Z","timestamp":1750183765000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3530775"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,8]]},"references-count":177,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,12,31]]}},"alternative-id":["10.1145\/3530775"],"URL":"https:\/\/doi.org\/10.1145\/3530775","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,8]]},"assertion":[{"value":"2021-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-08-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}