{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:28:17Z","timestamp":1750307297362,"version":"3.41.0"},"reference-count":28,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2011,5,1]],"date-time":"2011-05-01T00:00:00Z","timestamp":1304208000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000015","name":"U.S. Department of Energy","doi-asserted-by":"publisher","award":["DEAC04-94AL85000"],"award-info":[{"award-number":["DEAC04-94AL85000"]}],"id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2011,5]]},"abstract":"<jats:p>Modern scientific applications are large, complex, and highly parallel they are commonly executed on supercomputers with tens of thousands of processors. Yet these applications still commonly require weeks or even months to execute. Thus, single-thread performance remains a concern for highly parallel scientific applications. Adding a reconfigurable accelerator to each CPU could improve system performance; however, scientific applications have design constraints that differ from most application domains commonly accelerated by reconfigurable logic. In this article, we discuss the constraints imposed by scientific applications on the computation model, the accelerator architecture, and the accelerator\u2019s communication interface with the CPU. Based on these constraints and application analysis, we have previously proposed adding a Reconfigurable Functional Unit (RFU) to accelerate integer graphs that calculate complex memory addresses. In this work, we now propose a flexible multi-instruction interface technique that allows dataflow graphs implemented on the RFU to access a large number of inputs and outputs with minor CPU datapath modifications. We present an in-depth examination of the performance effects of different communication interfaces that use this technique, and select one that best matches the needs of Sandia\u2019s scientific applications. Although RFU execution overall improves performance, we also isolate two key negative performance effects introduced by aggregating CPU instructions into dataflow graphs: delayed issue and graph serialization. Finally, to demonstrate the marketability of an RFU beyond scientific applications, we reanalyze the proposed interfaces using the SPEC-fp benchmark suite. We show that although choosing an interface based on SPEC-fp needs is detrimental to Sandia application performance, choosing an interface based on Sandia demands works well for more general-purpose applications.<\/jats:p>","DOI":"10.1145\/1968502.1968510","type":"journal-article","created":{"date-parts":[[2011,6,6]],"date-time":"2011-06-06T11:51:38Z","timestamp":1307361098000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Scientific Application Demands on a Reconfigurable Functional Unit Interface"],"prefix":"10.1145","volume":"4","author":[{"given":"Kyle","family":"Rupnow","sequence":"first","affiliation":[{"name":"University of Wisconsin-Madison and Sandia National Laboratories"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Keith D.","family":"Underwood","sequence":"additional","affiliation":[{"name":"Intel Corporation"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Katherine","family":"Compton","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2011,5]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2005.1568535"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP.2008.4580145"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2004.15"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.45"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/581630.581672"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Burger D. and Austin T. M. 1997. The Simplescalar tool set version 2.0. Tech. rep. CS-TR-97-1342. Burger D. and Austin T. M. 1997. The Simplescalar tool set version 2.0. Tech. rep. CS-TR-97-1342.","DOI":"10.1145\/268806.268810"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2005.9"},{"volume-title":"Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201936)","author":"Clark N.","key":"e_1_2_1_8_1","unstructured":"Clark , N. , Zhong , H. , and Mahlke , S . 2003. Processor acceleration through automated instruction set customization . In Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201936) . IEEE Computer Society, 129. Clark, N., Zhong, H., and Mahlke, S. 2003. Processor acceleration through automated instruction set customization. In Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201936). IEEE Computer Society, 129."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/508352.508353"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1046192.1046206"},{"volume-title":"Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI. 23--40","author":"Cronquist D. C.","key":"e_1_2_1_11_1","unstructured":"Cronquist , D. C. , Fisher , C. , Figueroa , M. , Franklin , P. , and Ebeling , C . 1999. Architecture design of reconfigurable pipelined datapaths . In Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI. 23--40 . Cronquist, D. C., Fisher, C., Figueroa, M., Franklin, P., and Ebeling, C. 1999. Architecture design of reconfigurable pipelined datapaths. In Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI. 23--40."},{"volume-title":"Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM\u201904)","author":"Dehon A.","key":"e_1_2_1_12_1","unstructured":"Dehon , A. , Adams , J. , Delorimier , M. , Kapre , N. , Matsuda , Y. , Naeimi , H. , Vanier , M. , and Wrighton , M . 2004. Design patterns for reconfigurable computing . In Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM\u201904) . 13--23. Dehon, A., Adams, J., Delorimier, M., Kapre, N., Matsuda, Y., Naeimi, H., Vanier, M., and Wrighton, M. 2004. Design patterns for reconfigurable computing. In Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM\u201904). 13--23."},{"volume-title":"Proceedings of the IEEE International Conference on Field-Programmable Technology. 73--80","author":"Evans J.","key":"e_1_2_1_13_1","unstructured":"Evans , J. , Rupnow , K. , and Compton , K . 2007. Reconfigurable functional units for scientific superscalar processors . In Proceedings of the IEEE International Conference on Field-Programmable Technology. 73--80 . Evans, J., Rupnow, K., and Compton, K. 2007. Reconfigurable functional units for scientific superscalar processors. In Proceedings of the IEEE International Conference on Field-Programmable Technology. 73--80."},{"key":"e_1_2_1_14_1","first-page":"1","article-title":"A linear complexity algorithm for the automatic generation of convex multiple input multiple output instructions","volume":"9","author":"Galuzzi C.","year":"2008","unstructured":"Galuzzi , C. , Bertels , K. , and Vassiliadis , S. 2008 . A linear complexity algorithm for the automatic generation of convex multiple input multiple output instructions . Int. J. Electron. 9 , 1 -- 17 . Galuzzi, C., Bertels, K., and Vassiliadis, S. 2008. A linear complexity algorithm for the automatic generation of convex multiple input multiple output instructions. Int. J. Electron. 9, 1--17.","journal-title":"Int. J. Electron."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the Supercomputer Best Practices Symposium. 1--2.","author":"Gara A.","year":"2005","unstructured":"Gara , A. 2005 . Blue gene\/l architecture . In Proceedings of the Supercomputer Best Practices Symposium. 1--2. Gara, A. 2005. Blue gene\/l architecture. In Proceedings of the Supercomputer Best Practices Symposium. 1--2."},{"volume-title":"Proceedings of the 5th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM\u201997)","author":"Hauck S.","key":"e_1_2_1_16_1","unstructured":"Hauck , S. , Fry , T. W. , Hosler , M. M. , and Kao , J. P . 1997. The Chimaera reconfigurable functional unit . In Proceedings of the 5th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM\u201997) . 87. Hauck, S., Fry, T. W., Hosler, M. M., and Kao, J. P. 1997. The Chimaera reconfigurable functional unit. In Proceedings of the 5th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM\u201997). 87."},{"volume-title":"Proceedings of the 5th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM\u201997)","author":"Hauser J. R.","key":"e_1_2_1_17_1","unstructured":"Hauser , J. R. and Wawrzynek , J . 1997. Garp: A MIPS processor with a reconfigurable coprocessor . In Proceedings of the 5th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM\u201997) . IEEE Computer Society Press, 12--21. Hauser, J. R. and Wawrzynek, J. 1997. Garp: A MIPS processor with a reconfigurable coprocessor. In Proceedings of the 5th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM\u201997). IEEE Computer Society Press, 12--21."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.869367"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/605440.605446"},{"volume-title":"CSRI Summer Proceedings. 2--22","author":"La Fratta P.","key":"e_1_2_1_20_1","unstructured":"La Fratta , P. , Rodrigues , A. , and Underwood , K. D . 2007. Architectural extensions for executing floating point instruction aggregates . In CSRI Summer Proceedings. 2--22 . La Fratta, P., Rodrigues, A., and Underwood, K. D. 2007. Architectural extensions for executing floating point instruction aggregates. In CSRI Summer Proceedings. 2--22."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/43.238612"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/192724.192749"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1183401.1183413"},{"volume-title":"Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines. 22--28","author":"Trimberger S.","key":"e_1_2_1_24_1","unstructured":"Trimberger , S. , Carberry , D. , Johnson , A. , and Wong , J . 1997. A time-multiplexed FPGA . In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines. 22--28 . Trimberger, S., Carberry, D., Johnson, A., and Wong, J. 1997. A time-multiplexed FPGA. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines. 22--28."},{"volume-title":"Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201907)","author":"Underwood K. D.","key":"e_1_2_1_25_1","unstructured":"Underwood , K. D. , Levenhagen , M. , and Rodrigues , A . 2007. Simulating red storm: Challenges and successes in building a system simulation . In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201907) . 1--10. Underwood, K. D., Levenhagen, M., and Rodrigues, A. 2007. Simulating red storm: Challenges and successes in building a system simulation. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201907). 1--10."},{"key":"e_1_2_1_26_1","volume-title":"Eds","author":"Wetzel J.","year":"2003","unstructured":"Wetzel , J. , Silha , E. , May , C. , and Frey , B. , Eds . 2003 . PowerPC User Instruction Set Architecture, Book 1, Version 2.01. IBM. Wetzel, J., Silha, E., May, C., and Frey, B., Eds. 2003. PowerPC User Instruction Set Architecture, Book 1, Version 2.01. IBM."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/228370.228388"},{"volume-title":"Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM\u201996)","author":"Wittig R. D.","key":"e_1_2_1_28_1","unstructured":"Wittig , R. D. and Chow , P . 1996. OneChip: An FPGA processor with reconfigurable logic . In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM\u201996) . 126--135. Wittig, R. D. and Chow, P. 1996. OneChip: An FPGA processor with reconfigurable logic. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM\u201996). 126--135."}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1968502.1968510","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1968502.1968510","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:59:48Z","timestamp":1750244388000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1968502.1968510"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,5]]},"references-count":28,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2011,5]]}},"alternative-id":["10.1145\/1968502.1968510"],"URL":"https:\/\/doi.org\/10.1145\/1968502.1968510","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"type":"print","value":"1936-7406"},{"type":"electronic","value":"1936-7414"}],"subject":[],"published":{"date-parts":[[2011,5]]},"assertion":[{"value":"2008-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-05-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}