{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,24]],"date-time":"2025-06-24T05:44:54Z","timestamp":1750743894839,"version":"3.41.0"},"reference-count":43,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,8,12]],"date-time":"2021-08-12T00:00:00Z","timestamp":1628726400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"EU H2020 Research and Innovation Programme","award":["671653"],"award-info":[{"award-number":["671653"]}]},{"name":"UK EPSRC","award":["EP\/P010040\/1, EP\/N031768\/1, and EP\/L016796\/1"],"award-info":[{"award-number":["EP\/P010040\/1, EP\/N031768\/1, and EP\/L016796\/1"]}]},{"name":"JST\/CREST program \u201cResearch and Development on Unified Environment of Accelerated Computing and Interconnection for Post-Petascale Era\u201d"},{"name":"JSPS KAKENHI","award":["20K19770"],"award-info":[{"award-number":["20K19770"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2021,9,30]]},"abstract":"<jats:p>Next-generation high-performance computing platforms will handle extreme data- and compute-intensive problems that are intractable with today\u2019s technology. A promising path in achieving the next leap in high-performance computing is to embrace heterogeneity and specialised computing in the form of reconfigurable accelerators such as FPGAs, which have been shown to speed up compute-intensive tasks with reduced power consumption. However, assessing the feasibility of large-scale heterogeneous systems requires fast and accurate performance prediction. This article proposes Performance Estimation for Reconfigurable Kernels and Systems (PERKS), a novel performance estimation framework for reconfigurable dataflow platforms. PERKS makes use of an analytical model with machine and application parameters for predicting the performance of multi-accelerator systems and detecting their bottlenecks. Model calibration is automatic, making the model flexible and usable for different machine configurations and applications, including hypothetical ones. Our experimental results show that PERKS can predict the performance of current workloads on reconfigurable dataflow platforms with an accuracy above 91%. The results also illustrate how the modelling scales to large workloads, and how performance impact of architectural features can be estimated in seconds.<\/jats:p>","DOI":"10.1145\/3452742","type":"journal-article","created":{"date-parts":[[2021,8,12]],"date-time":"2021-08-12T14:51:22Z","timestamp":1628779882000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Analytical Performance Estimation for Large-Scale Reconfigurable Dataflow Platforms"],"prefix":"10.1145","volume":"14","author":[{"given":"Ryota","family":"Yasudo","sequence":"first","affiliation":[{"name":"Hiroshima University, Hiroshima, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jos\u00e9 G. F.","family":"Coutinho","sequence":"additional","affiliation":[{"name":"Imperial College London, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ana-Lucia","family":"Varbanescu","sequence":"additional","affiliation":[{"name":"University of Amsterdam, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wayne","family":"Luk","sequence":"additional","affiliation":[{"name":"Imperial College London, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hideharu","family":"Amano","sequence":"additional","affiliation":[{"name":"Keio University, Kanagawa, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tobias","family":"Becker","sequence":"additional","affiliation":[{"name":"Maxeler Technologies Ltd., UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ce","family":"Guo","sequence":"additional","affiliation":[{"name":"Imperial College London, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,8,12]]},"reference":[{"volume-title":"Retrieved","year":"2020","key":"e_1_2_1_1_1","unstructured":"Amazon. 2020 . Amazon EC2 F1 Instances . Retrieved May 22, 2021 from https:\/\/aws.amazon.com\/ec2\/instance-types\/f1\/. Amazon. 2020. Amazon EC2 F1 Instances. Retrieved May 22, 2021 from https:\/\/aws.amazon.com\/ec2\/instance-types\/f1\/."},{"volume-title":"Retrieved","year":"2020","key":"e_1_2_1_2_1","unstructured":"Maxeler. 2020 . Maxeler AppGallery . Retrieved May 22, 2021 from http:\/\/appgallery.maxeler.com\/. Maxeler. 2020. Maxeler AppGallery. Retrieved May 22, 2021 from http:\/\/appgallery.maxeler.com\/."},{"volume-title":"Retrieved","year":"2020","key":"e_1_2_1_3_1","unstructured":"Maxeler. 2020 . Maxeler Technologies Home Page . Retrieved May 22, 2021 from http:\/\/maxeler.com\/. Maxeler. 2020. Maxeler Technologies Home Page. Retrieved May 22, 2021 from http:\/\/maxeler.com\/."},{"volume-title":"Retrieved","year":"2020","key":"e_1_2_1_4_1","unstructured":"TOP500. 2020 . TOP500 Supercomputer Sites . Retrieved May 22, 2021 from https:\/\/www.top500.org\/lists\/2020\/11\/. TOP500. 2020. TOP500 Supercomputer Sites. Retrieved May 22, 2021 from https:\/\/www.top500.org\/lists\/2020\/11\/."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3140659.3080216"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCBB.2016.2535385"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689066"},{"volume-title":"Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software.163\u2013174","author":"Bakhoda A.","key":"e_1_2_1_9_1","unstructured":"A. Bakhoda , G. L. Yuan , W. W. L. Fung , H. Wong , and T. M. Aamodt . 2009. Analyzing CUDA workloads using a detailed GPU simulator . In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software.163\u2013174 . A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software.163\u2013174."},{"volume-title":"Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software.","author":"Balaprakash P.","key":"e_1_2_1_10_1","unstructured":"P. Balaprakash , D. Buntinas , A. Chan , A. Guha , R. Gupta , S. H. K. Narayanan , A. A. Chien , P. Hovland , and B. Norris . 2013. Exascale workload characterization and architecture implications . In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. P. Balaprakash, D. Buntinas, A. Chan, A. Guha, R. Gupta, S. H. K. Narayanan, A. A. Chien, P. Hovland, and B. Norris. 2013. Exascale workload characterization and architecture implications. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/1523254"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/3130379.3130474"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/78.950795"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1014535317056"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2999539"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOTS.2010.43"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3241793.3241802"},{"volume-title":"Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL\u201918)","author":"Cross A.-I.","key":"e_1_2_1_18_1","unstructured":"A.-I. Cross , L. Guo , W. Luk , and M. Salmon . 2018. CRRS: Custom regression and regularisation solver for large-scale linear systems . In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL\u201918) . A.-I. Cross, L. Guo, W. Luk, and M. Salmon. 2018. CRRS: Custom regression and regularisation solver for large-scale linear systems. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL\u201918)."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1661438.1661443"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1155\/2013\/428078"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.1980.1653418"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2013.111"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2017.3211107"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/578533"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACSD.2006.33"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/3104322.3104326"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPA.2014.31"},{"key":"e_1_2_1_28_1","volume-title":"Turing Award Lecture. Retrieved","author":"Hennessy J.","year":"2021","unstructured":"J. Hennessy and D. Patterson . 2018. A New Golden Age for Computer Architecture . Turing Award Lecture. Retrieved May 22, 2021 from http:\/\/iscaconf.org\/isca2018\/docs\/HennessyPattersonTuringLectureISCA4June2018.pdf. J. Hennessy and D. Patterson. 2018. A New Golden Age for Computer Architecture. Turing Award Lecture. Retrieved May 22, 2021 from http:\/\/iscaconf.org\/isca2018\/docs\/HennessyPattersonTuringLectureISCA4June2018.pdf."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555815.1555775"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-32820-6_90"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1735688.1735696"},{"key":"e_1_2_1_32_1","volume-title":"Design space exploration of stream-based dataflow architectures. Nederlands Elektronica en Radiogenootschap 64, 5","author":"Kienhuis Al. C. J.","year":"1999","unstructured":"Al. C. J. Kienhuis . 1999. Design space exploration of stream-based dataflow architectures. Nederlands Elektronica en Radiogenootschap 64, 5 ( 1999 ), 191. Al. C. J. Kienhuis. 1999. Design space exploration of stream-based dataflow architectures. Nederlands Elektronica en Radiogenootschap 64, 5 (1999), 191."},{"volume-title":"Proceedings of the International Conference on Field Programmable Logic and Applications. 1\u20136.","author":"Gan L.","key":"e_1_2_1_33_1","unstructured":"L. Gan , H. Fu , C. Yang , W. Luk , W. Xue , O. Mencer , X. Huang , and G. Yang . 2014. A highly-efficient and green data flow engine for solving Euler atmospheric equations . In Proceedings of the International Conference on Field Programmable Logic and Applications. 1\u20136. L. Gan, H. Fu, C. Yang, W. Luk, W. Xue, O. Mencer, X. Huang, and G. Yang.2014. A highly-efficient and green data flow engine for solving Euler atmospheric equations. In Proceedings of the International Conference on Field Programmable Logic and Applications. 1\u20136."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1987.13876"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751220"},{"volume-title":"Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops.","author":"Nestorov A. M.","key":"e_1_2_1_36_1","unstructured":"A. M. Nestorov , E. Reggiani , H. Palikareva , P. Burovskiy , T. Becker , and M. D. Santambrogio . 2017. A scalable dataflow implementation of Curran\u2019s approximation algorithm . In Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops. A. M. Nestorov, E. Reggiani, H. Palikareva, P. Burovskiy, T. Becker, and M. D. Santambrogio. 2017. A scalable dataflow implementation of Curran\u2019s approximation algorithm. In Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2997465.2997472"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPA.2011.36"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.335.6067.394"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.5555\/1964238.1964240"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370865"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342014568690"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"volume-title":"Proceedings of the International Conference on Field-Programmable Technology (FPT\u201918)","author":"Yasudo R.","key":"e_1_2_1_44_1","unstructured":"R. Yasudo , J. Coutinho , A. Varbanescu , W. Luk , H. Amano , and T. Becker . 2018. Performance estimation for exascale reconfigurable dataflow platforms . In Proceedings of the International Conference on Field-Programmable Technology (FPT\u201918) . 314\u2013317. R. Yasudo, J. Coutinho, A. Varbanescu, W. Luk, H. Amano, and T. Becker. 2018. Performance estimation for exascale reconfigurable dataflow platforms. In Proceedings of the International Conference on Field-Programmable Technology (FPT\u201918). 314\u2013317."}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3452742","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3452742","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:27Z","timestamp":1750193247000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3452742"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,12]]},"references-count":43,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,9,30]]}},"alternative-id":["10.1145\/3452742"],"URL":"https:\/\/doi.org\/10.1145\/3452742","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"type":"print","value":"1936-7406"},{"type":"electronic","value":"1936-7414"}],"subject":[],"published":{"date-parts":[[2021,8,12]]},"assertion":[{"value":"2020-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-08-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}