{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,17]],"date-time":"2025-09-17T15:32:10Z","timestamp":1758123130829,"version":"3.41.0"},"reference-count":34,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2015,9,11]],"date-time":"2015-09-11T00:00:00Z","timestamp":1441929600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Comput. Syst."],"published-print":{"date-parts":[[2015,9,11]]},"abstract":"<jats:p>There has been recent interest in exploring the acceleration of nonvectorizable workloads with spatially programmed architectures that are designed to efficiently exploit pipeline parallelism. Such an architecture faces two main problems: how to efficiently control each processing element (PE) in the system, and how to facilitate inter-PE communication without the overheads of traditional shared-memory coherent memory. In this article, we explore solving these problems using triggered instructions and latency-insensitive channels. Triggered instructions completely eliminate the program counter (PC) and allow programs to transition concisely between states without explicit branch instructions. Latency-insensitive channels allow efficient communication of inter-PE control information while simultaneously enabling flexible code placement and improving tolerance for variable events such as cache accesses. Together, these approaches provide a unified mechanism to avoid overserialized execution, essentially achieving the effect of techniques such as dynamic instruction reordering and multithreading.<\/jats:p>\n          <jats:p>Our analysis shows that a spatial accelerator using triggered instructions and latency-insensitive channels can achieve 8 \u00d7 greater area-normalized performance than a traditional general-purpose processor. Further analysis shows that triggered control reduces the number of static and dynamic instructions in the critical paths by 62% and 64%, respectively, over a PC-style baseline, increasing the performance of the spatial programming approach by 2.0 \u00d7.<\/jats:p>","DOI":"10.1145\/2754930","type":"journal-article","created":{"date-parts":[[2015,9,15]],"date-time":"2015-09-15T12:09:15Z","timestamp":1442318955000},"page":"1-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures"],"prefix":"10.1145","volume":"33","author":[{"given":"Michael","family":"Pellauer","sequence":"first","affiliation":[{"name":"Intel, NVIDIA, Hudson, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Angshuman","family":"Parashar","sequence":"additional","affiliation":[{"name":"Intel, NVIDIA, Hudson, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael","family":"Adler","sequence":"additional","affiliation":[{"name":"Intel, Hudson, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bushra","family":"Ahsan","sequence":"additional","affiliation":[{"name":"Intel, Hudson, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Randy","family":"Allmon","sequence":"additional","affiliation":[{"name":"Intel, Hudson, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Neal","family":"Crago","sequence":"additional","affiliation":[{"name":"Intel, NVIDIA, Hudson, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kermin","family":"Fleming","sequence":"additional","affiliation":[{"name":"Intel, Hudson, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohit","family":"Gambhir","sequence":"additional","affiliation":[{"name":"Intel, Hudson, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aamer","family":"Jaleel","sequence":"additional","affiliation":[{"name":"Intel, NVIDIA, Hudson, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tushar","family":"Krishna","sequence":"additional","affiliation":[{"name":"Intel, Georgia Institute of Technology, Hudson, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel","family":"Lustig","sequence":"additional","affiliation":[{"name":"Princeton University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stephen","family":"Maresh","sequence":"additional","affiliation":[{"name":"Intel, Hudson, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vladimir","family":"Pavlov","sequence":"additional","affiliation":[{"name":"Intel, Hudson, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rachid","family":"Rayess","sequence":"additional","affiliation":[{"name":"Intel, Hudson, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Antonia","family":"Zhai","sequence":"additional","affiliation":[{"name":"University of Minnesota"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joel","family":"Emer","sequence":"additional","affiliation":[{"name":"Intel and MIT, NVIDIA, Hudson, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,9,11]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.48862"},{"key":"e_1_2_1_2_1","volume-title":"Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, and Katherine A. Yelick.","author":"Asanovic Krste","year":"2006","unstructured":"Krste Asanovic , Ras Bodik , Bryan Christopher Catanzaro , Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, and Katherine A. Yelick. 2006 . The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB\/EECS-2006-183. EECS Department, University of California , Berkeley. Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, and Katherine A. Yelick. 2006. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB\/EECS-2006-183. EECS Department, University of California, Berkeley."},{"key":"e_1_2_1_3_1","unstructured":"Bluespec Inc. 2007. Bluespec System Verilog Reference Guide. Bluespec.  Bluespec Inc. 2007. Bluespec System Verilog Reference Guide. Bluespec."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2004.65"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/43.945302"},{"volume-title":"Parallel Program Design: A Foundation","author":"Mani Chandy K.","key":"e_1_2_1_6_1","unstructured":"K. Mani Chandy and Jayadev Misra . 1988. Parallel Program Design: A Foundation . Addison-Wesley . K. Mani Chandy and Jayadev Misra. 1988. Parallel Program Design: A Foundation. Addison-Wesley."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/508352.508353"},{"volume-title":"Principles and Practices of Interconnection Networks. Morgan Kaufmann","author":"Dally William","key":"e_1_2_1_8_1","unstructured":"William Dally and Brian Towles . 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann , San Francisco, CA . William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann, San Francisco, CA."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/642089.642111"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/360933.360975"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.982918"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/800015.808199"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2145694.2145725"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/2014698.2014884"},{"volume-title":"Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM\u201997)","author":"John","key":"e_1_2_1_16_1","unstructured":"John R. Hauser and John Wawrzynek. 1997. Garp: A MIPS processor with a reconfigurable coprocessor . In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM\u201997) . 12--21. John R. Hauser and John Wawrzynek. 1997. Garp: A MIPS processor with a reconfigurable coprocessor. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM\u201997). 12--21."},{"key":"e_1_2_1_17_1","series-title":"Lecture Notes in Computer Science","volume-title":"Compiler Construction","author":"Jan Hoogerbrugge and Henk Corp","unstructured":"Jan Hoogerbrugge and Henk Corp oraal. 1994. Transport-triggering vs. operation-triggering . In Compiler Construction . Lecture Notes in Computer Science , Vol. 786 . Springer , 435--449. Jan Hoogerbrugge and Henk Corporaal. 1994. Transport-triggering vs. operation-triggering. In Compiler Construction. Lecture Notes in Computer Science, Vol. 786. Springer, 435--449."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2150976.2151011"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1137\/0206024"},{"volume-title":"Supercomputers: Algorithms, Architectures, and Scientific Computation","author":"Kung Hsiang-Tsung","key":"e_1_2_1_20_1","unstructured":"Hsiang-Tsung Kung . 1986. The CMU warp processor . In Supercomputers: Algorithms, Architectures, and Scientific Computation , F. A. Matsen and T. Tajima (Eds.). University of Texas Press , Austin, TX , 235--247. Hsiang-Tsung Kung. 1986. The CMU warp processor. In Supercomputers: Algorithms, Architectures, and Scientific Computation, F. A. Matsen and T. Tajima (Eds.). University of Texas Press, Austin, TX, 235--247."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/92.820764"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-45234-8_7"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1854273.1854344"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPGA.1996.564808"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-006-0019-9"},{"key":"e_1_2_1_26_1","unstructured":"Li-Shiuan Peh and Natalie Enright Jerger. 2009. On-Chip Networks. Morgan and Claypool.   Li-Shiuan Peh and Natalie Enright Jerger. 2009. On-Chip Networks. Morgan and Claypool."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1629911.1629986"},{"volume-title":"Proceedings of the 2002 IEEE Custom Integrated Circuits Conference. 63--66","author":"Schmit Herman","key":"e_1_2_1_28_1","unstructured":"Herman Schmit , David Whelihan , Andrew Tsai , Matthew Moe , Benjamin Levine , and R. Reed Taylor . 2002. PipeRench: A virtualized programmable datapath in 0.18 micron technology . In Proceedings of the 2002 IEEE Custom Integrated Circuits Conference. 63--66 . Herman Schmit, David Whelihan, Andrew Tsai, Matthew Moe, Benjamin Levine, and R. Reed Taylor. 2002. PipeRench: A virtualized programmable datapath in 0.18 micron technology. In Proceedings of the 2002 IEEE Custom Integrated Circuits Conference. 63--66."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.17"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1233307.1233308"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2002.997877"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2009.2013772"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/1715759.1715781"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339687"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the Solid-State Circuits Conference (ISSCC\u201906)","author":"Yu Zhiyi","year":"2006","unstructured":"Zhiyi Yu , Michael Meeuwsen , Ryan Apperson , Omar Sattari , Michael Lai , Jeremy Webb , Eric Work , Tinoosh Mohsenin , Mandeep Singh , and Bevan Baas . 2006 . An asynchronous array of simple processors for DSP applications . In Proceedings of the Solid-State Circuits Conference (ISSCC\u201906) . 1696--1705. Zhiyi Yu, Michael Meeuwsen, Ryan Apperson, Omar Sattari, Michael Lai, Jeremy Webb, Eric Work, Tinoosh Mohsenin, Mandeep Singh, and Bevan Baas. 2006. An asynchronous array of simple processors for DSP applications. In Proceedings of the Solid-State Circuits Conference (ISSCC\u201906). 1696--1705."}],"container-title":["ACM Transactions on Computer Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2754930","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2754930","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:16:40Z","timestamp":1750227400000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2754930"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,9,11]]},"references-count":34,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2015,9,11]]}},"alternative-id":["10.1145\/2754930"],"URL":"https:\/\/doi.org\/10.1145\/2754930","relation":{},"ISSN":["0734-2071","1557-7333"],"issn-type":[{"type":"print","value":"0734-2071"},{"type":"electronic","value":"1557-7333"}],"subject":[],"published":{"date-parts":[[2015,9,11]]},"assertion":[{"value":"2014-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-09-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}