{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:50:00Z","timestamp":1750308600210,"version":"3.41.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2016,5,23]],"date-time":"2016-05-23T00:00:00Z","timestamp":1463961600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2016,7,21]]},"abstract":"<jats:p>This work presents a methodology for efficient exploration of data interleaving and data-to-memory mapping options for Single Instruction Multiple Data (SIMD) platform architectures. The system architecture consists of a reconfigurable clustered scratch-pad memory and a SIMD functional unit, which performs the same operation on multiple input data in parallel. The memory accesses contribute substantially to the overall energy consumption of an embedded system executing a data intensive task. The scope of this work is the reduction of the overall energy consumption by increasing the utilization of the functional units and decreasing the number of memory accesses. The presented methodology is tested using a number of benchmark applications with holes in their access scheme. Potential gains are calculated based on the energy models, both for the processing and the memory part of the system. The reduction in energy consumption after efficient interleaving and mapping of data is between 40% and 80% for the complete system and the studied benchmarks.<\/jats:p>","DOI":"10.1145\/2894754","type":"journal-article","created":{"date-parts":[[2016,5,24]],"date-time":"2016-05-24T21:47:58Z","timestamp":1464126478000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Integrated Exploration Methodology for Data Interleaving and Data-to-Memory Mapping on SIMD Architectures"],"prefix":"10.1145","volume":"15","author":[{"given":"Iason","family":"Filippopoulos","sequence":"first","affiliation":[{"name":"Norwegian University of Science and Technology, Trondheim, Norway"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Namita","family":"Sharma","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology, Hauz Khas, New Delhi, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Francky","family":"Catthoor","sequence":"additional","affiliation":[{"name":"IMEC &amp; K.U. Leuven, Leuven, Belgium"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Per Gunnar","family":"Kjeldsberg","sequence":"additional","affiliation":[{"name":"Norwegian University of Science and Technology, Trondheim, Norway"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Preeti Ranjan","family":"Panda","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology, Hauz Khas, New Delhi, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2016,5,23]]},"reference":[{"volume-title":"Proceedings of the 32nd Annual International Symposium on Microarchitecture (MICRO-32)","author":"Santosh","key":"e_1_2_1_1_1","unstructured":"Santosh G. Abraham and Scott A Mahlke. 1999. Automatic and efficient evaluation of memory hierarchies for embedded systems . In Proceedings of the 32nd Annual International Symposium on Microarchitecture (MICRO-32) . IEEE, 114--125. Santosh G. Abraham and Scott A Mahlke. 1999. Automatic and efficient evaluation of memory hierarchies for embedded systems. In Proceedings of the 32nd Annual International Symposium on Microarchitecture (MICRO-32). IEEE, 114--125."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750397"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/54.844336"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/344166.344518"},{"volume-title":"Designing Embedded Processors","author":"Brockmeyer Erik","key":"e_1_2_1_5_1","unstructured":"Erik Brockmeyer , Bart Durinck , Henk Corp oraal, and Francky Catthoor . 2007. Layer assignment techniques for low energy in multi-layered memory organizations . In Designing Embedded Processors . Springer , 157--190. Erik Brockmeyer, Bart Durinck, Henk Corporaal, and Francky Catthoor. 2007. Layer assignment techniques for low energy in multi-layered memory organizations. In Designing Embedded Processors. Springer, 157--190."},{"key":"e_1_2_1_6_1","volume-title":"Compiler User Manual","author":"Cadence RTL","year":"2014","unstructured":"RTL Cadence . 2014. Compiler User Manual ( 2014 ). http:\/\/www.cadence.com\/rl\/Resources\/datasheets\/encounter_rtlcompiler.pdf. RTL Cadence. 2014. Compiler User Manual (2014). http:\/\/www.cadence.com\/rl\/Resources\/datasheets\/encounter_rtlcompiler.pdf."},{"volume-title":"Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design","author":"Catthoor Francky","key":"e_1_2_1_7_1","unstructured":"Francky Catthoor , Sven Wuytack , G. E. de Greef , Florin Banica , Lode Nachtergaele , and Arnout Vandecappelle . 1998. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design . Springer . Francky Catthoor, Sven Wuytack, G. E. de Greef, Florin Banica, Lode Nachtergaele, and Arnout Vandecappelle. 1998. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design. Springer."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063401"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/857198.857949"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/1509633.1509814"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1289881.1289915"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10617-014-9145-6"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1155\/ES\/2006\/56320"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/4.535411"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/602902.602999"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2013.2238990"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBCAS.2011.2176726"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ESTMED.2007.4375794"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.543711"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the Conference on European Design Automation. IEEE Computer Society Press, 226--231","author":"Jantsch Axel","year":"1994","unstructured":"Axel Jantsch , Peeter Ellervee , Ahmed Hemani , Johnny \u00d6berg , and Hannu Tenhunen . 1994 . Hardware\/software partitioning and minimizing memory interface traffic . In Proceedings of the Conference on European Design Automation. IEEE Computer Society Press, 226--231 . Axel Jantsch, Peeter Ellervee, Ahmed Hemani, Johnny \u00d6berg, and Hannu Tenhunen. 1994. Hardware\/software partitioning and minimizing memory interface traffic. In Proceedings of the Conference on European Design Automation. IEEE Computer Society Press, 226--231."},{"key":"e_1_2_1_21_1","unstructured":"Wang Kai and Xu Zhiwei. 2003. Synopsys Prime Power Manual Release U-2003.06-QA. (2003).  Wang Kai and Xu Zhiwei. 2003. Synopsys Prime Power Manual Release U-2003.06-QA. (2003)."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/603095.603136"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2579677"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2005.2"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/MDT.2003.1173050"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/43.790618"},{"key":"e_1_2_1_28_1","volume-title":"Peng Yang, Chun Wong, Jos\u00e9 Ignacio G\u00f3mez, Stefaan Himpe, Chantal Ykman-Couvreur, and Francky Catthoor.","author":"Ma Zhe","year":"2007","unstructured":"Zhe Ma , Pol Marchal , Daniele Paolo Scarpazza , Peng Yang, Chun Wong, Jos\u00e9 Ignacio G\u00f3mez, Stefaan Himpe, Chantal Ykman-Couvreur, and Francky Catthoor. 2007 . Systematic Methodology for Real-Time Cost-Effective Mapping of Dynamic Concurrent Task-Based Systems on Heterogenous Platforms. Springer Science & Business Media . Zhe Ma, Pol Marchal, Daniele Paolo Scarpazza, Peng Yang, Chun Wong, Jos\u00e9 Ignacio G\u00f3mez, Stefaan Himpe, Chantal Ykman-Couvreur, and Francky Catthoor. 2007. Systematic Methodology for Real-Time Cost-Effective Mapping of Dynamic Concurrent Task-Based Systems on Heterogenous Platforms. Springer Science & Business Media."},{"key":"e_1_2_1_29_1","doi-asserted-by":"crossref","unstructured":"A. Macii L. Benini and M. Poncino. 2002. Memory Design Techniques for Low-Energy Embedded Systems. Kluwer Academic Publishers.  A. Macii L. Benini and M. Poncino. 2002. Memory Design Techniques for Low-Energy Embedded Systems. Kluwer Academic Publishers.","DOI":"10.1007\/978-1-4757-5808-5"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/344166.344610"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 8th International Conference on Parallel and Distributed Computing Systems. Citeseer, 1--8.","author":"Manjikian Naraig","year":"1995","unstructured":"Naraig Manjikian and Tarek Abdelrahman . 1995 . Array data layout for the reduction of cache conflicts . In Proceedings of the 8th International Conference on Parallel and Distributed Computing Systems. Citeseer, 1--8. Naraig Manjikian and Tarek Abdelrahman. 1995. Array data layout for the reduction of cache conflicts. In Proceedings of the 8th International Conference on Parallel and Distributed Computing Systems. Citeseer, 1--8."},{"key":"e_1_2_1_32_1","volume-title":"Field-Programmable Technology, 2002.(FPT). Proceedings. 2002 IEEE International Conference on. IEEE, 166--173","author":"Mei Bingfeng","year":"2002","unstructured":"Bingfeng Mei , Serge Vernalde , Diederik Verkest , Hugo De Man , and Rudy Lauwereins . 2002 . DRESC: A retargetable compiler for coarse-grained reconfigurable architectures . In Field-Programmable Technology, 2002.(FPT). Proceedings. 2002 IEEE International Conference on. IEEE, 166--173 . Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In Field-Programmable Technology, 2002.(FPT). Proceedings. 2002 IEEE International Conference on. IEEE, 166--173."},{"volume-title":"Proceedings of the 2010 53rd IEEE International Midwest Symposium on Circuits and Systems (MWSCAS\u201910)","author":"Meinerzhagen P.","key":"e_1_2_1_33_1","unstructured":"P. Meinerzhagen , C. Roth , and A. Burg . 2010. Towards generic low-power area-efficient standard cell based memory architectures . In Proceedings of the 2010 53rd IEEE International Midwest Symposium on Circuits and Systems (MWSCAS\u201910) . IEEE, 129--132. P. Meinerzhagen, C. Roth, and A. Burg. 2010. Towards generic low-power area-efficient standard cell based memory architectures. In Proceedings of the 2010 53rd IEEE International Midwest Symposium on Circuits and Systems (MWSCAS\u201910). IEEE, 129--132."},{"key":"e_1_2_1_34_1","article-title":"Benchmarking of standard-cell based memories in the sub-VT domain in 65-nm CMOS technology","volume":"1","author":"Meinerzhagen Pascal","year":"2011","unstructured":"Pascal Meinerzhagen , S. M. Yasser Sherazi , Andreas Burg , and Joachim Neves Rodrigues . 2011 . Benchmarking of standard-cell based memories in the sub-VT domain in 65-nm CMOS technology . IEEE Transactions on Emerging and Selected Topics in Circuits and Systems 1 , 2 (2011). Pascal Meinerzhagen, S. M. Yasser Sherazi, Andreas Burg, and Joachim Neves Rodrigues. 2011. Benchmarking of standard-cell based memories in the sub-VT domain in 65-nm CMOS technology. IEEE Transactions on Emerging and Selected Topics in Circuits and Systems 1, 2 (2011).","journal-title":"IEEE Transactions on Emerging and Selected Topics in Circuits and Systems"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.suscom.2013.11.001"},{"key":"e_1_2_1_36_1","first-page":"8","article-title":"High-speed memory architectures for multimedia applications. Circuits and Devices Magazine","volume":"13","author":"Oshima Yoichi","year":"1997","unstructured":"Yoichi Oshima , Bing J. Sheu , and Steve H. Jen . 1997 . High-speed memory architectures for multimedia applications. Circuits and Devices Magazine , IEEE 13 , 1 (1997), 8 -- 13 . Yoichi Oshima, Bing J. Sheu, and Steve H. Jen. 1997. High-speed memory architectures for multimedia applications. Circuits and Devices Magazine, IEEE 13, 1 (1997), 8--13.","journal-title":"IEEE"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/375977.375978"},{"volume-title":"Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration","author":"Panda Preeti Ranjan","key":"e_1_2_1_38_1","unstructured":"Preeti Ranjan Panda , Nikil D. Dutt , and Alexandru Nicolau . 1999. Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration . Springer Science & Business Media . Preeti Ranjan Panda, Nikil D. Dutt, and Alexandru Nicolau. 1999. Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration. Springer Science & Business Media."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/54.922803"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.5555\/645463.757645"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/92.555990"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2013.6638128"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2747875"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/309847.310090"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.5555\/882452.874376"},{"key":"e_1_2_1_46_1","volume-title":"Geng Daniel Liu, and Wen-Mei W. Hwu","author":"Sung Jui","year":"2012","unstructured":"I.- Jui Sung , Geng Daniel Liu, and Wen-Mei W. Hwu . 2012 . DL : A data layout transformation system for heterogeneous computing. In Innovative Parallel Computing (InPar) 2012. IEEE , 1--11. I.-Jui Sung, Geng Daniel Liu, and Wen-Mei W. Hwu. 2012. DL: A data layout transformation system for heterogeneous computing. In Innovative Parallel Computing (InPar) 2012. IEEE, 1--11."}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2894754","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2894754","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T19:05:40Z","timestamp":1750273540000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2894754"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,5,23]]},"references-count":45,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2016,7,21]]}},"alternative-id":["10.1145\/2894754"],"URL":"https:\/\/doi.org\/10.1145\/2894754","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2016,5,23]]},"assertion":[{"value":"2015-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-05-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}