{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T15:35:53Z","timestamp":1776785753556,"version":"3.51.2"},"reference-count":41,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2022,8,18]],"date-time":"2022-08-18T00:00:00Z","timestamp":1660780800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Parallel Comput."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>This article proposes and implements a Coarse-grained dynamically Reconfigurable Architecture, named Reconfigurable Multimedia Accelerator (REMAC). REMAC architecture is driven by the pipelined multi-instruction-multi-data execution model for exploiting multi-level parallelism of the computation-intensive loops in multimedia applications. The novel architecture of REMAC's reconfigurable processing unit (RPU) allows multiple iterations of a kernel loop can execute concurrently in the pipelining fashion by the temporal overlapping of the configuration fetch, execution, and store processes as much as possible. To address the huge bandwidth required by parallel processing units, REMAC architecture is proposed to efficiently exploit the abundant data locality in the kernel loops to decrease data access bandwidth while increase the efficiency of pipelined execution. In addition, a novel architecture of dedicated hierarchy data memory system is proposed to increase data reuse between iterations and make data always available for parallel operation of RPU. The proposed architecture was modeled at RTL using VHDL language. Several benchmark applications were mapped onto REMAC to validate the high-flexibility and high-performance of the architecture and prove that it is appropriate for a wide set of multimedia applications. The experimental results show that REMAC's performance is better than Xilinx Virtex-II, ADRES, REMUS-II, and TI C64+ DSP.<\/jats:p>","DOI":"10.1145\/3543544","type":"journal-article","created":{"date-parts":[[2022,7,9]],"date-time":"2022-07-09T11:35:55Z","timestamp":1657366555000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Design and Implementation of a Coarse-grained Dynamically Reconfigurable Multimedia Accelerator"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3417-3447","authenticated-orcid":false,"given":"Hung K.","family":"Nguyen","sequence":"first","affiliation":[{"name":"VNU University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4259-9579","authenticated-orcid":false,"given":"Xuan-Tu","family":"Tran","sequence":"additional","affiliation":[{"name":"VNU The Information Technology Institute (ITI), Vietnam National University, Hanoi, Vietnam"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,8,18]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3357375"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPGA.1997.624610"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/2.839324"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/12.859540"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-45234-8_7"},{"key":"e_1_3_1_7_2","volume-title":"Proceedings of the International Conference on Field Programmable Technology","author":"Mei B.","year":"2002","unstructured":"B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In Proceedings of the International Conference on Field Programmable Technology."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/980152.980156"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454140"},{"issue":"7","key":"e_1_3_1_10_2","first-page":"1858","article-title":"Longxing SHI: Reconfiguration process optimization of dynamically coarse grain reconfigurable architecture for multimedia applications","volume":"95","author":"Bo Liu","year":"2013","unstructured":"Bo Liu, Peng Cao, Min Zhu, Jun Yang, Leibo Liu, and Shaojun Wei. 2013. Longxing SHI: Reconfiguration process optimization of dynamically coarse grain reconfigurable architecture for multimedia applications. IEICE Trans. Inf. Sys. E95-D, 7 (July 2013), 1858\u20131871.","journal-title":"IEICE Trans. Inf. Sys."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1587\/transinf.E96.D.601"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2442116.2442120"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2014.12"},{"issue":"2","key":"e_1_3_1_14_2","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1109\/TVLSI.2015.2400219","article-title":"Improving nested loop pipelining on coarse-grained reconfigurable architectures","volume":"24","author":"Yin S.","year":"2015","unstructured":"S. Yin, D. Liu, Y. Peng, L. Liu, and S. Wei. 2015. Improving nested loop pipelining on coarse-grained reconfigurable architectures. IEEE Trans. Very Large Scale Integr. Syst. 24, 2 (2015), 507\u2013520.","journal-title":"IEEE Trans. Very Large Scale Integr. Syst."},{"key":"e_1_3_1_15_2","volume-title":"Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE\u201916)","author":"Souza J. D.","year":"2016","unstructured":"J. D. Souza, L. C. M. B. Rutzig, and A. C. S. B. Filho1. 2016. A reconfigurable heterogeneous multicore with a homogeneous ISA. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE\u201916)."},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080256"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2018.2797600"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2019.2937738"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE48585.2020.9116408"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE51398.2021.9473955"},{"key":"e_1_3_1_21_2","unstructured":"AMBA Specification (Rev 2.0). [n.d.]. Retrieved from http:\/\/www.arm.com."},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0055254"},{"key":"e_1_3_1_23_2","first-page":"81","volume-title":"Compilation Techniques for Reconfigurable Architectures","author":"Cardoso Jo\u00e3o M. P.","year":"2011","unstructured":"Jo\u00e3o M. P. Cardoso and Pedro C. Diniz. 2011. Compilation Techniques for Reconfigurable Architectures. Springer Science & Business Media, 81."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4615-3636-9"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/12.752657"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2017.10.016"},{"key":"e_1_3_1_27_2","unstructured":"Datasheet of Intel 8257 Programmable DMA Controller. Retrieved from http:\/\/www.eecs.northwestern.edu\/~ypa448\/Microp\/8257.pdf."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3012084"},{"key":"e_1_3_1_29_2","unstructured":"Lattice Semiconductor Corporation. 2010. Scatter-gather direct memory access controller IP core users guide. Retrieved from https:\/\/www.latticesemi.com."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACV.1994.341300"},{"key":"e_1_3_1_31_2","unstructured":"Altera Corporation. 2009. Scatter-Gather DMA Controller Core Quartus II 9.1. Retrieved from https:\/\/www.intel.com\/."},{"key":"e_1_3_1_32_2","unstructured":"Xilinx. 2010. Channelized Direct Memory Access and Scatter Gather. February 2010."},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-28365-9_8"},{"key":"e_1_3_1_34_2","unstructured":"Emelie Nilsson. 2013. DMA controller for LEON3 SoC: S Using AMBA. Retrieved from http:\/\/www.diva-portal.org\/."},{"key":"e_1_3_1_35_2","unstructured":"Gaisler Research. 2013. GRLIB IP Core User's Manual. Version 1.3.0-b4133. Retrieved from https:\/\/www.gaisler.com\/products\/grlib\/grip.pdf."},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP.2008.4580186"},{"key":"e_1_3_1_37_2","first-page":"33","article-title":"Self-Partial and dynamic reconfiguration implementation for AES using FPGA","volume":"2","author":"Alaoui Ismaili Z.","year":"2009","unstructured":"Z. Alaoui Ismaili and A. Moussa. 2009. Self-Partial and dynamic reconfiguration implementation for AES using FPGA. IJCSI Int. J. Comput. Sci. Issues 2 (2009), 33\u201340.","journal-title":"IJCSI Int. J. Comput. Sci. Issues"},{"key":"e_1_3_1_38_2","unstructured":"J. Jurely and H. Hakkarainen. 1997. TI's new C6x DSP screams at 1.600 MIPS. Microprocessor Report. 1997."},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0377-0427(00)00413-1"},{"key":"e_1_3_1_40_2","first-page":"5","article-title":"Close approximations of sigmoid functions by sum of step for VLSI implementation of neural networks. Sci","volume":"3","author":"Beiu Valeriu","year":"1994","unstructured":"Valeriu Beiu, J. A. Peperstraete, J. Vandewalle, and R. Lauwereins. 1994. Close approximations of sigmoid functions by sum of step for VLSI implementation of neural networks. Sci. Ann. Cuza Univ. 3 (1994), 5\u201334.","journal-title":"Ann. Cuza Univ."},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.5772\/4833"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TENCON.2015.7373165"}],"container-title":["ACM Transactions on Parallel Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3543544","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3543544","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:47Z","timestamp":1750186847000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3543544"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,18]]},"references-count":41,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3543544"],"URL":"https:\/\/doi.org\/10.1145\/3543544","relation":{},"ISSN":["2329-4949","2329-4957"],"issn-type":[{"value":"2329-4949","type":"print"},{"value":"2329-4957","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,18]]},"assertion":[{"value":"2021-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-08-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}