{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T23:16:22Z","timestamp":1776122182068,"version":"3.50.1"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"4s","license":[{"start":{"date-parts":[[2014,4,1]],"date-time":"2014-04-01T00:00:00Z","timestamp":1396310400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Research Training Group 1773 \u201cHeterogeneous Image Systems\u201d"},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2014,7]]},"abstract":"<jats:p>We introduce a novel class of massively parallel processor architectures called invasive Tightly-Coupled Processor Arrays (TCPAs). The presented processor class is a highly parameterizable template which can be tailored before runtime to fulfill costumers' requirements such as performance, area cost, and energy efficiency. These programmable accelerators are well suited for domain-specific computing from the areas of signal, image, and video processing as well as other streaming processing applications. To overcome future scaling issues (e.g., power consumption, reliability, resource management, as well as application parallelization and mapping), TCPAs are inherently designed in way that they support self-adaptivity and resource awareness at hardware level. Here, we follow a recently introduced resource-aware parallel computing paradigm called invasive computing where an application can dynamically claim, execute, and release the resources. Furthermore, we show how invasive computing can be used as an enabler for power management. For the first time, we present a seamless mapping flow for TCPAs, based on a domain-specific language. Moreover, we outline a complete symbolic mapping approach. Finally, we support our claims by comparing a TCPA against an ARM Mali-T604 GPU in terms of performance and energy efficiency.<\/jats:p>","DOI":"10.1145\/2584660","type":"journal-article","created":{"date-parts":[[2014,4,29]],"date-time":"2014-04-29T12:32:32Z","timestamp":1398774752000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":57,"title":["Invasive Tightly-Coupled Processor Arrays"],"prefix":"10.1145","volume":"13","author":[{"given":"Frank","family":"Hannig","sequence":"first","affiliation":[{"name":"University of Erlangen-Nuremberg, Germany"}]},{"given":"Vahid","family":"Lari","sequence":"additional","affiliation":[{"name":"University of Erlangen-Nuremberg, Germany"}]},{"given":"Srinivas","family":"Boppu","sequence":"additional","affiliation":[{"name":"University of Erlangen-Nuremberg, Germany"}]},{"given":"Alexandru","family":"Tanase","sequence":"additional","affiliation":[{"name":"University of Erlangen-Nuremberg, Germany"}]},{"given":"Oliver","family":"Reiche","sequence":"additional","affiliation":[{"name":"University of Erlangen-Nuremberg, Germany"}]}],"member":"320","published-online":{"date-parts":[[2014,4]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1024499601571"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ReConFig.2011.91"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.5555\/1786054.1786063"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2007.92"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1403375.1403555"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/11532378_4"},{"key":"e_1_2_1_7_1","unstructured":"Andrew Duller Gajinder Panesar and Daniel Towner. 2003. Parallel processing\u2014The picoChip way&excl; In Communicating Process Architectures IOS Press 125--138.  Andrew Duller Gajinder Panesar and Daniel Towner. 2003. Parallel processing\u2014The picoChip way&excl; In Communicating Process Architectures IOS Press 125--138."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/PARELEC.2006.43"},{"key":"e_1_2_1_9_1","volume-title":"Encyclopedia of Parallel Computing","author":"Feautrier Paul"},{"key":"e_1_2_1_10_1","unstructured":"Martin Fowler. 2010. Domain Specific Languages. 1st Ed. Addison-Wesley Professional.   Martin Fowler. 2010. Domain Specific Languages. 1 st Ed. Addison-Wesley Professional."},{"key":"e_1_2_1_11_1","unstructured":"Gcc. 2013. The gnu compiler collection. http:\/\/gcc.gnu.org.  Gcc. 2013. The gnu compiler collection. http:\/\/gcc.gnu.org."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2011.18"},{"key":"e_1_2_1_13_1","volume-title":"Adapteva: More flops, less watts: Epiphany offers floating-point accelerator for mobile processors. Microprocessor Report 2","author":"Gwennup Linley","year":"2011"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-78610-8_30"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2010.5681464"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/1025114.1025160"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASPDAC.2012.6164944"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2010.5434077"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the International Federation for Information Processing Congress (IFIP'74)","author":"Kahn Gilles","year":"1974"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2010.38"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2006.270293"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1166\/jolpe.2009.1008"},{"key":"e_1_2_1_23_1","unstructured":"Peter Kogge Keren Bergman Shekhar Borkar Dan Campbell William Carlson William Dally Monty Denneau Paul Franzon William Harrod Kerry Hill Jon Hiller Sherman Karp Stephen Keckler Dean Klein Robert Lucas Mark Richards Al Scarpelli Steven Scott Allan Snavely Thomas Sterling R. StanleyWilliams and Katherine Yelick. 2008. Exascale computing study: Technology challenges in achieving exascale systems. http:\/\/www.cse.nd.edu\/Reports\/2008\/TR-2008-13.pdf.  Peter Kogge Keren Bergman Shekhar Borkar Dan Campbell William Carlson William Dally Monty Denneau Paul Franzon William Harrod Kerry Hill Jon Hiller Sherman Karp Stephen Keckler Dean Klein Robert Lucas Mark Richards Al Scarpelli Steven Scott Allan Snavely Thomas Sterling R. StanleyWilliams and Katherine Yelick. 2008. Exascale computing study: Technology challenges in achieving exascale systems. http:\/\/www.cse.nd.edu\/Reports\/2008\/TR-2008-13.pdf."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-012374287-2.50015-X"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2390191.2390193"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP.2011.6043240"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/977395.977673"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/780732.780758"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02311229"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2008.31"},{"key":"e_1_2_1_31_1","volume-title":"Microprocessor Forum, October, In-Stat\/MDR","author":"Motomura Masato"},{"key":"e_1_2_1_32_1","unstructured":"Steven Muchnick. 1997. Advanced Compiler Design and Implementation. Morgan Kaufmann.   Steven Muchnick. 1997. Advanced Compiler Design and Implementation. Morgan Kaufmann."},{"key":"e_1_2_1_33_1","unstructured":"Aaftab Munshi. 2012. The OpenCL specification version 1.2. Khronos OpenCL Working Group. http:\/\/developer.amd.com\/wordpress\/media\/2012\/10\/opencl-1.2.pdf  Aaftab Munshi. 2012. The OpenCL specification version 1.2. Khronos OpenCL Working Group. http:\/\/developer.amd.com\/wordpress\/media\/2012\/10\/opencl-1.2.pdf"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.13"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/192724.192731"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.19"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2011.2158343"},{"key":"e_1_2_1_38_1","unstructured":"Anand Lal Shimp. 2013. The ARM vs x86 wars have begun: In-depth power analysis of Atom Krait and Cortex A15. http:\/\/www.anandtech.com\/show\/6536\/arm-vs-x86-the-real-showdown\/12.  Anand Lal Shimp. 2013. The ARM vs x86 wars have begun: In-depth power analysis of Atom Krait and Cortex A15. http:\/\/www.anandtech.com\/show\/6536\/arm-vs-x86-the-real-showdown\/12."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.859540"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1524\/itit.2008.0499"},{"key":"e_1_2_1_41_1","volume-title":"Multiprocessor System-on-Chip: Hardware Design and Tool Integration","author":"Teich J\u00fcrgen"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP.2013.6567543"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the International Workshop on Algorithms and Parallel VLSI Architectures","volume":"339","author":"Thiele Lothar","year":"1991"},{"key":"e_1_2_1_44_1","unstructured":"Tilera Corporation. 2013. http:\/\/www.tilera.com.  Tilera Corporation. 2013. http:\/\/www.tilera.com."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/950162.950167"},{"key":"e_1_2_1_46_1","unstructured":"Michael Joseph Wolfe. 1996. High Performance Compilers for Parallel Computing. Addison-Wesley.  Michael Joseph Wolfe. 1996. High Performance Compilers for Parallel Computing. Addison-Wesley."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(96)00063-4"},{"key":"e_1_2_1_48_1","unstructured":"Jingling Xue. 2000. Loop Tiling for Parallelism. Kluwer Academic Publishers Norwell MA.   Jingling Xue. 2000. Loop Tiling for Parallelism. Kluwer Academic Publishers Norwell MA."}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2584660","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2584660","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:01:43Z","timestamp":1750230103000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2584660"}},"subtitle":["A Domain-Specific Architecture\/Compiler Co-Design Approach"],"short-title":[],"issued":{"date-parts":[[2014,4]]},"references-count":48,"journal-issue":{"issue":"4s","published-print":{"date-parts":[[2014,7]]}},"alternative-id":["10.1145\/2584660"],"URL":"https:\/\/doi.org\/10.1145\/2584660","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,4]]},"assertion":[{"value":"2013-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-04-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}