{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:38:17Z","timestamp":1750307897454,"version":"3.41.0"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2007,9,1]],"date-time":"2007-09-01T00:00:00Z","timestamp":1188604800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2007,9]]},"abstract":"<jats:p>Multiprocessor system-on-a-chip (MPSoC) architectures have received a lot of attention in the past years, but few advances in compilation techniques target these architectures. This is particularly true for the exploitation of data locality. Most of the compilation techniques for parallel architectures discussed in the literature are based on a single loop nest. This article presents new techniques that consist in applying loop fusion and tiling to several loop nests and to parallelize the resulting code across different processors. These two techniques reduce the number of memory accesses. However, they increase dependencies and thereby reduce the exploitable parallelism in the code. This article tries to address this contradiction. To optimize the memory space used by temporary arrays, smaller buffers are used as a replacement. Different strategies are studied to optimize the processing time spent accessing these buffers. The experiments show that these techniques yield a significant reduction in the number of data cache misses (30%) and in processing time (50%).<\/jats:p>","DOI":"10.1145\/1278349.1278356","type":"journal-article","created":{"date-parts":[[2007,10,14]],"date-time":"2007-10-14T12:41:11Z","timestamp":1192365671000},"page":"43","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["MPSoC memory optimization using program transformation"],"prefix":"10.1145","volume":"12","author":[{"given":"Youcef","family":"Bouchebaba","sequence":"first","affiliation":[{"name":"\u00c9cole Polytechnique de Montr\u00e9al"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bruno","family":"Girodias","sequence":"additional","affiliation":[{"name":"\u00c9cole Polytechnique de Montr\u00e9al"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gabriela","family":"Nicolescu","sequence":"additional","affiliation":[{"name":"\u00c9cole Polytechnique de Montr\u00e9al"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"El Mostapha","family":"Aboulhamid","sequence":"additional","affiliation":[{"name":"Universit\u00e9 de Montr\u00e9al"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bruno","family":"Lavigueur","sequence":"additional","affiliation":[{"name":"STMicroelectronics"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pierre","family":"Paulin","sequence":"additional","affiliation":[{"name":"STMicroelectronics"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2007,9]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/155090.155101"},{"volume-title":"Advances in Languages and Compiler for Parallel Processing","author":"Banerjee U.","key":"e_1_2_1_2_1"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.5555\/1025127.1025992"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP.2006.20"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.4380240104"},{"volume":"183","volume-title":"Proceedings of the 29th Hawaii International Conference on System Sciences (Hicss'96)","author":"Carr S.","key":"e_1_2_1_7_1"},{"volume-title":"S.","year":"1998","author":"Catthoor F.","key":"e_1_2_1_8_1"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/207110.207145"},{"volume-title":"Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors. IEEE Computer Society, 359","author":"Darte A.","key":"e_1_2_1_10_1"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/520793.825721"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/951710.951749"},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Darte A. Robert Y. and Vivien F. 2000. Scheduling and Automatic Parallelization. Birkhauser Boston MA.   Darte A. Robert Y. and Vivien F. 2000. Scheduling and Automatic Parallelization. Birkhauser Boston MA.","DOI":"10.1007\/978-1-4612-1362-8"},{"volume-title":"Proceedings of the 28th Annual International Symposium on Microarchitecture","author":"Davidson J. W.","key":"e_1_2_1_14_1"},{"volume-title":"Automatic parallelization in the polytope model. In the Data Parallel Programming Model: Foundations","series-title":"Lecture Notes In Computer Science","author":"Feautrier P.","key":"e_1_2_1_15_1"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/500001.500025"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/0743-7315(88)90014-7"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/73560.73588"},{"key":"e_1_2_1_20_1","unstructured":"ITRS. 2003. International Technology Roadmap for Semiconductors.  ITRS. 2003. International Technology Roadmap for Semiconductors."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1067915.1067922"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/581630.581650"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/321406.321418"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/335231.335244"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/645671.665526"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.795218"},{"key":"e_1_2_1_27_1","first-page":"41","article-title":"Linear loop transformations in optimizing compilers for parallel machines","volume":"27","author":"Kulkarni D.","year":"1995","journal-title":"Australian Comput. J."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/53990.54022"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/360827.360844"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/646728.703499"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(98)00029-5"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065579.1065609"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.577265"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1016720.1016767"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/258492.258520"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/143365.143488"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/248209.237140"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1016720.1016735"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/513829.513850"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/365151.365152"},{"volume-title":"The systematic design of systolic arrays","author":"Quinton P.","key":"e_1_2_1_41_1"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2004.62"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/sl0766-004-1459-8"},{"volume":"768","volume-title":"Proceedings of the 6th International Workshop on Languages and Compilers For Parallel Computing. U. Banerjee, D. Gelernter, A. Nicolau, and D. A. Padua, Eds. Lecture Notes In Computer Science","author":"Tu P.","key":"e_1_2_1_44_1"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/113445.113449"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.97902"},{"volume-title":"Loop Tiling for Parallelism","author":"Xue J.","key":"e_1_2_1_47_1"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2004.28"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1278349.1278356","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1278349.1278356","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T14:47:29Z","timestamp":1750258049000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1278349.1278356"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,9]]},"references-count":46,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2007,9]]}},"alternative-id":["10.1145\/1278349.1278356"],"URL":"https:\/\/doi.org\/10.1145\/1278349.1278356","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"type":"print","value":"1084-4309"},{"type":"electronic","value":"1557-7309"}],"subject":[],"published":{"date-parts":[[2007,9]]},"assertion":[{"value":"2007-09-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}