{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:15:00Z","timestamp":1750220100376,"version":"3.41.0"},"reference-count":96,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,8,8]],"date-time":"2022-08-08T00:00:00Z","timestamp":1659916800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2022,12,31]]},"abstract":"<jats:p>\n            We present a High Level Synthesis compiler that automatically obtains a multi-chip accelerator system from a single-threaded sequential C\/C++ application. Invoking the multi-chip accelerator is functionally identical to invoking the single-threaded sequential code the multi-chip accelerator is compiled from. Therefore, software development for using the multi-chip accelerator hardware is simplified, but the multi-chip accelerator can exhibit extremely high parallelism. We have implemented, tested, and verified our push-button system design model on multiple field-programmable gate arrays (FPGAs) of the Amazon Web Services EC2 F1 instances platform, using, as an example, a sequential-natured DES key search application that does not have any DOALL loops and that tries each candidate key in order and stops as soon as a correct key is found. An 8- FPGA accelerator produced by our compiler achieves\n            <jats:bold>44,600<\/jats:bold>\n            times better performance than an x86 Xeon CPU executing the sequential single-threaded C program the accelerator was compiled from. New features of our compiler system include: an ability to parallelize outer loops with loop-carried control dependences, an ability to pipeline an outer loop without fully unrolling its inner loops, and fully automated deployment, execution and termination of multi-FPGA application-specific accelerators in the AWS cloud, without requiring any manual steps.\n          <\/jats:p>","DOI":"10.1145\/3507698","type":"journal-article","created":{"date-parts":[[2022,5,16]],"date-time":"2022-05-16T12:46:46Z","timestamp":1652705206000},"page":"1-42","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Highly Parallel Multi-FPGA System Compilation from Sequential C\/C++ Code in the AWS Cloud"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6256-4248","authenticated-orcid":false,"given":"Kemal","family":"Ebcioglu","sequence":"first","affiliation":[{"name":"Global Supercomputing Corporation, Yorktown Heights, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3005-1813","authenticated-orcid":false,"given":"Ismail","family":"San","sequence":"additional","affiliation":[{"name":"Eskisehir Technical University, Department of Electrical and Electronics Engineering, Iki Eylul Kampus, Eskisehir, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,8,8]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-017-2120-9"},{"key":"e_1_3_2_3_2","unstructured":"Alibaba Cloud. 2021. Deep Dive into Alibaba Cloud F3 FPGA as a Service Instances. Retrieved from https:\/\/www.alibabacloud.com\/blog\/deep-dive-into-alibaba-cloud-f3-fpga-as-a-service-instances_594057. Accessed: 2021-04-19."},{"key":"e_1_3_2_4_2","unstructured":"Alibaba Cloud. 2021. FPGA-accelerated compute optimized instance family. Retrieved from https:\/\/www.alibabacloud.com\/help\/doc-detail\/108504.htm. Accessed: 2021-04-19."},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/212094.212131"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.5555\/910630"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/29873.29875"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/29873.29875"},{"key":"e_1_3_2_9_2","unstructured":"Amazon. 2021. Amazon EC2 Placement Groups. Retrieved from https:\/\/docs.aws.amazon.com\/AWSEC2\/latest\/UserGuide\/placement-groups.html. [Online; accessed 11-April-2021]."},{"key":"e_1_3_2_10_2","unstructured":"Amazon EC2. 2021. AWS EC2 FPGA Development Kit. Retrieved 28-June-2022 from https:\/\/github.com\/aws\/aws-fpga."},{"key":"e_1_3_2_11_2","unstructured":"Amazon Elastic Cloud Compute. 2021. Amazon EC2 F1 Instances. Retrieved from https:\/\/aws.amazon.com\/ec2\/instance-types\/f1\/. Accessed: 2021-04-19."},{"key":"e_1_3_2_12_2","unstructured":"Amazon FPGA Development User Forum. 2021. FPGA Development - AWS Developer Forums. Retrieved from https:\/\/forums.aws.amazon.com\/forum.jspa?forumID=243&start=0. Accessed: 2021-04-19."},{"key":"e_1_3_2_13_2","unstructured":"K. Gostelow Arvind and Wil Plouffe. 1978. ID Report : An Asynchronous Programming Language and Computing Machine. Technical Report 114 University of California at Irvine Computer Science Department May 1978."},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2013.2278385"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2017.01.029"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/NTMS.2012.6208693"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/857076.857077"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/359576.359579"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3242897"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/N-SSC.2007.4785534"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(98)00020-9"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/1950413.1950423"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195647"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3241793.3241795"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3294054"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-0348-8534-8_21"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/2934583.2953984"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/2897937.2905012"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2011.2110592"},{"key":"e_1_3_2_30_2","unstructured":"crack.sh. 2021. The World\u2019s Fastest DES Cracker. Retrieved from https:\/\/crack.sh\/. Accessed: 2021-04-20."},{"key":"e_1_3_2_31_2","article-title":"Useful parallelism in a multiprocessing environment","author":"Cytron Ron","year":"1985","unstructured":"Ron Cytron. 1985. Useful parallelism in a multiprocessing environment. IBM Thomas J. Watson Research Division.","journal-title":"IBM Thomas J. Watson Research Division"},{"key":"e_1_3_2_32_2","first-page":"836","volume-title":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","author":"Cytron Ron","year":"1986","unstructured":"Ron Cytron. 1986. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region. IEEE, 836\u2013844."},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/99.660313"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021754"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2018.112130030"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.1974.1050511"},{"key":"e_1_3_2_37_2","volume-title":"Proceedings of the Information Processing, IFIP Congress 1968.","author":"Dennis Jack B.","year":"1968","unstructured":"Jack B. Dennis. 1968. Programming generality, parallelism and computer architecture, In Proceedings of the Information Processing, IFIP Congress 1968."},{"key":"e_1_3_2_38_2","first-page":"47","volume-title":"On the Design and Specification of a Common Base Language","author":"Dennis Jack B.","year":"1972","unstructured":"Jack B. Dennis. 1972. On the Design and Specification of a Common Base Language. Technical Report. MASSACHUSETTS INST OF TECH CAMBRIDGE PROJECT MAC. 47\u201374 pages."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-06859-7_145"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.1980.1653418"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2017.74"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466822"},{"key":"e_1_3_2_43_2","unstructured":"Ebcioglu Kemal and Kultursay Emre and Kandemir Mahmut Taylan. 2015. Method and system for converting a single-threaded software program into an application-specific supercomputer. Retrieved from https:\/\/patents.google.com\/patent\/US8966457B2. US Patent 8 966 457 filed on 15 November 2011."},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/255305.255317"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.5555\/320080.320124"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.5555\/92402.92419"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/2000064.2000108"},{"key":"e_1_3_2_48_2","unstructured":"EUROPRACTICE. 2021. Multi Layer Mask. Retrieved from https:\/\/europractice-ic.com\/mpw-prototyping\/general\/mlm\/. Accessed: 2021-04-28."},{"key":"e_1_3_2_49_2","unstructured":"EUROPRACTICE. 2021. Multi Project Wafer (MPW). Retrieved from https:\/\/europractice-ic.com\/mpw-prototyping\/general\/mpw-minisic\/. Accessed: 2021-04-28."},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2011.77"},{"key":"e_1_3_2_51_2","volume-title":"Cracking DES: Secrets of Encryption Research, Wiretap Politics and Chip Design","author":"Foundation The Electronic Frontier","year":"1998","unstructured":"The Electronic Frontier Foundation. 1998. Cracking DES: Secrets of Encryption Research, Wiretap Politics and Chip Design. O\u2019Reilly & Associates, Inc."},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/12.509907"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.comnet.2006.11.009"},{"key":"e_1_3_2_54_2","first-page":"81","volume-title":"AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs","author":"Guo Licheng","year":"2021","unstructured":"Licheng Guo, Yuze Chi, Jie Wang, Jason Lau, Weikang Qiao, Ecenur Ustun, Zhiru Zhang, and Jason Cong. 2021. AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs. Association for Computing Machinery, New York, NY, 81\u201392."},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTR.2002.1137760"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304619"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2019.8916407"},{"key":"e_1_3_2_58_2","unstructured":"IBM Cloud & AI Systems Research. 2021. cloudFPGA: Field programmable gate arrays for the cloud. Retrieved from https:\/\/www.zurich.ibm.com\/cci\/cloudFPGA\/. Accessed: 2021-04-19."},{"key":"e_1_3_2_59_2","unstructured":"Intel. 2021. Data Plane Development Kit. Retrieved from http:\/\/dpdk.org. Accessed 04.04.2021."},{"key":"e_1_3_2_60_2","unstructured":"Intel. 2021. Intel FPGA SDK for OpenCL. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/software\/programmable\/sdk-for-opencl\/overview.html. Accessed: 2021-04-25."},{"key":"e_1_3_2_61_2","unstructured":"Intel. 2021. Intel High Level Synthesis Compiler. Retrieved from https:\/\/www.intel.com.tr\/content\/www\/tr\/tr\/software\/programmable\/quartus-prime\/hls-compiler.html. Accessed: 2021-04-25."},{"key":"e_1_3_2_62_2","unstructured":"Jaime Humberto Moreno Mayan Moudgill. 1998. Method and apparatus for reordering memory operations in a processor. Retrieved from https:\/\/patents.google.com\/patent\/US5758051. US Patent 5 758 051 filed on 6 November 1996."},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2020.101775"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/190314.190324"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4613-1705-0_5"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/3098822.3098842"},{"key":"e_1_3_2_67_2","unstructured":"James Larus. 2015. Whole Program Paths\u2013slides). Retrieved from https:\/\/pdfs.semanticscholar.org\/6328\/1ffa177c6d88841ddc6e01d1b0a74ea853e0.pdf. Accessed 11.08.2021."},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2017.2664067"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2017.2783363"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.25"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAIE.2016.7575063"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1145\/267959.269966"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2015.2513673"},{"key":"e_1_3_2_74_2","unstructured":"NVIDIA. 2021. NVIDIA CUDA platform. Retrieved from https:\/\/developer.nvidia.com\/cuda-zone. Accessed 29.04.2021."},{"key":"e_1_3_2_75_2","unstructured":"U.S. DEPARTMENT OF COMMERCE\/National Institute of Standards and Technology. 1999. FIPS PUB 46-3 Data Encryption Standard (DES). Retrieved from https:\/\/csrc.nist.gov\/csrc\/media\/publications\/fips\/46\/3\/archive\/1999-10-25\/documents\/fips46-3.pdf. Accessed 30.04.2021."},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1145\/192724.192731"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.1080\/19393555.2012.660678"},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2013.10.013"},{"key":"e_1_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/6.591665"},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1016\/0743-7315(91)90118-S"},{"key":"e_1_3_2_81_2","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830821"},{"key":"e_1_3_2_82_2","first-page":"478","volume-title":"Proceedings of the International Conference on Dependability and Complex Systems","author":"Sugier Jaros\u0142aw","year":"2019","unstructured":"Jaros\u0142aw Sugier. 2019. Cracking the DES cipher with cost-optimized FPGA devices. In Proceedings of the International Conference on Dependability and Complex Systems. Springer, Cham, 478\u2013487."},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2020.101908"},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","DOI":"10.5555\/144953.144977"},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","DOI":"10.1145\/3359983"},{"issue":"8","key":"e_1_3_2_86_2","first-page":"1185","article-title":"HEAWS: An accelerator for homomorphic encryption on the amazon AWS FPGA","volume":"69","author":"Turan Furkan","year":"2020","unstructured":"Furkan Turan, Sujoy Sinha Roy, and Ingrid Verbauwhede. 2020. HEAWS: An accelerator for homomorphic encryption on the amazon AWS FPGA. IEEE Transactions on Computers 69, 8 (2020), 1185\u20131196.","journal-title":"IEEE Transactions on Computers"},{"key":"e_1_3_2_87_2","doi-asserted-by":"publisher","DOI":"10.17487\/RFC0908"},{"key":"e_1_3_2_88_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2010.07.010"},{"key":"e_1_3_2_89_2","first-page":"3","volume-title":"Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism","author":"Wang Jiang","year":"1993","unstructured":"Jiang Wang and Christine Eisenbeis. 1993. Decomposed software pipelining: A new approach to exploit instruction level parallelism for loop programs. In Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism. North-Holland Publishing Co., NLD, 3\u201314."},{"key":"e_1_3_2_90_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021730"},{"key":"e_1_3_2_91_2","doi-asserted-by":"crossref","first-page":"4725","DOI":"10.1109\/ISCAS.2005.1465688","volume-title":"Proceedings of the 2005 IEEE International Symposium on Circuits and Systems","author":"Wu Meng-Chiou","year":"2005","unstructured":"Meng-Chiou Wu and Rung-Bin Lin. 2005. Multiple project wafers for medium-volume IC production. In Proceedings of the 2005 IEEE International Symposium on Circuits and Systems. IEEE, Kobe, Japan, 4725\u20134728."},{"key":"e_1_3_2_92_2","volume-title":"Vivado Design Suite User Guide: High-Level Synthesis (UG902) (v2020.1 ed.)","author":"Xilinx","year":"2020","unstructured":"Xilinx 2020. Vivado Design Suite User Guide: High-Level Synthesis (UG902) (v2020.1 ed.). Xilinx. Retrieved from https:\/\/www.xilinx.com\/support\/documentation\/sw_manuals\/xilinx2020_1\/ug902-vivado-high-level-synthesis.pdf."},{"key":"e_1_3_2_93_2","unstructured":"Xilinx. 2021. Vitis High-Level Synthesis. Retrieved from http:\/\/www.xilinx.com\/products\/design-tools\/vivado\/integration\/esl-design.html. Accessed: 2021-04-25."},{"key":"e_1_3_2_94_2","volume-title":"Vivado Design Suite User Guide: Implementation (UG904) (v2020.2 ed.)","author":"Xilinx","year":"2021","unstructured":"Xilinx 2021. Vivado Design Suite User Guide: Implementation (UG904) (v2020.2 ed.). Xilinx. Retrieved from https:\/\/www.xilinx.com\/content\/dam\/xilinx\/support\/documentation\/sw_manuals\/xilinx2020_2\/ug904-vivado-implem-entation.pdf."},{"key":"e_1_3_2_95_2","doi-asserted-by":"publisher","DOI":"10.1145\/2744769.2744807"},{"key":"e_1_3_2_96_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00044"},{"key":"e_1_3_2_97_2","doi-asserted-by":"publisher","DOI":"10.1145\/1273440.1250668"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3507698","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3507698","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:10:15Z","timestamp":1750183815000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3507698"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,8]]},"references-count":96,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,12,31]]}},"alternative-id":["10.1145\/3507698"],"URL":"https:\/\/doi.org\/10.1145\/3507698","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"type":"print","value":"1936-7406"},{"type":"electronic","value":"1936-7414"}],"subject":[],"published":{"date-parts":[[2022,8,8]]},"assertion":[{"value":"2021-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-08-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}