{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:28:16Z","timestamp":1750307296170,"version":"3.41.0"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2010,11,1]],"date-time":"2010-11-01T00:00:00Z","timestamp":1288569600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2010,11]]},"abstract":"<jats:p>We present a framework for performance-, bandwidth-, and energy-efficient intercore communication in embedded MultiProcessor Systems-on-a-Chip (MPSoC). The methodology seamlessly integrates compiler, operating system, and hardware support to achieve a low-cost communication between synchronized producers and consumers. The technique is especially beneficial for data-streaming applications exploiting pipeline parallelism with computational phases mapped to separate cores. Code transformations utilizing a simple ISA support ensure that producer writes are propagated to consumers with a single interconnect transaction per cache block just prior to the producer exiting its synchronization region. Furthermore, in order to completely eliminate misses to shared data caused by interference with private data and also to minimize the cache energy, we integrate to the proposed framework a cache way partitioning policy based on a simple cache configurability support, which isolates the shared buffers from other cache traffic. This mechanism results in significant power savings since only a subset of the cache ways needs to be looked up for each cache access. The end result of the proposed framework is a single communication transaction per shared cache block between a producer and a consumer with no coherence misses on the consumer caches. Our experiments demonstrate significant reductions in interconnect traffic, cache misses, and energy for a set of multiprocessor benchmarks.<\/jats:p>","DOI":"10.1145\/1870109.1870117","type":"journal-article","created":{"date-parts":[[2010,12,1]],"date-time":"2010-12-01T20:18:10Z","timestamp":1291234690000},"page":"1-39","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Energy- and Performance-Efficient Communication Framework for Embedded MPSoCs through Application-Driven Release Consistency"],"prefix":"10.1145","volume":"16","author":[{"given":"Chenjie","family":"Yu","sequence":"first","affiliation":[{"name":"University of Maryland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter","family":"Petrov","sequence":"additional","affiliation":[{"name":"University of Maryland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2010,11]]},"reference":[{"volume-title":"Proceedings of the 3rd International Symposium on High-Performance Computer Architecture. 204--215","author":"Abdel-Shafi H.","key":"e_1_2_1_1_1","unstructured":"Abdel-Shafi , H. , Hall , J. , Adve , S. , and Adve , V . 1997. An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors . In Proceedings of the 3rd International Symposium on High-Performance Computer Architecture. 204--215 . Abdel-Shafi, H., Hall, J., Adve, S., and Adve, V. 1997. An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors. In Proceedings of the 3rd International Symposium on High-Performance Computer Architecture. 204--215."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/320080.320119"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/6513.6514"},{"key":"e_1_2_1_4_1","unstructured":"ARM Ltd. 2010. ARM11 Family. ARM Ltd. ARM Ltd. 2010. ARM11 Family . ARM Ltd."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1346281.1346290"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/781131.781144"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2006.82"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1248377.1248398"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1080695.1069991"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/121132.121159"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-71528-3_14"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2007.346210"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2006.23"},{"volume-title":"Winning the SOC Revolution","author":"Cumming P.","key":"e_1_2_1_14_1","unstructured":"Cumming , P. 2003. The ti omap platform approach to soc . In Winning the SOC Revolution . Kluwer Academic Publishers . Cumming, P. 2003. The ti omap platform approach to soc. In Winning the SOC Revolution. Kluwer Academic Publishers."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/349299.349309"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250734.1250760"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/566408.566471"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/325164.325102"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1278480.1278537"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/379605.379665"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1037947.1024406"},{"volume-title":"Intel XScale Microarchitecture","author":"Intel Corporation","key":"e_1_2_1_22_1","unstructured":"Intel Corporation . Intel XScale Microarchitecture . Intel Corporation . Intel Corporation. Intel XScale Microarchitecture. Intel Corporation."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1006209.1006252"},{"key":"e_1_2_1_24_1","unstructured":"Kennedy K. and Allen J. R. 2002. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann San Francisco CA. Kennedy K. and Allen J. R. 2002. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach . Morgan Kaufmann San Francisco CA."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.918001"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/224170.224398"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/161494.161501"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/325164.325132"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/263699.263719"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/DATE.2005.148"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1057661.1057728"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1151074.1151081"},{"volume-title":"Proceedings of the Conference on Supercomputing (CS\u201992)","author":"Mahlke S. A.","key":"e_1_2_1_33_1","unstructured":"Mahlke , S. A. , Chen , W. Y. , Gyllenhaal , J. C. , and Hwu , W . -M. W. 1992. Compiler code transformations for superscalar-based high performance systems . In Proceedings of the Conference on Supercomputing (CS\u201992) . 808--817. Mahlke, S. A., Chen, W. Y., Gyllenhaal, J. C., and Hwu, W.-M. W. 1992. Compiler code transformations for superscalar-based high performance systems. In Proceedings of the Conference on Supercomputing (CS\u201992). 808--817."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1146909.1146980"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC\u201996)","author":"Montanaro J.","year":"1996","unstructured":"Montanaro , J. , Witek , R. T. , Anne , K. , Black , A. J. , Dobberpuhl , D. W. , Donahue , P. M. , 1996 . A 160mhz, 32b 0.5w cmos risc microprocessor . In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC\u201996) . 214--229. Montanaro, J., Witek, R. T., Anne, K., Black, A. J., Dobberpuhl, D. W., Donahue, P. M., et al. 1996. A 160mhz, 32b 0.5w cmos risc microprocessor. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC\u201996). 214--229."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2005.42"},{"volume-title":"Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA\u201901)","author":"Moshovos A.","key":"e_1_2_1_37_1","unstructured":"Moshovos , A. , Memik , G. , Choudhary , A. , and Falsafi , B . 2001. Jetty: Filtering snoops for reduced energy consumption in smp servers . In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA\u201901) . 85--96. Moshovos, A., Memik, G., Choudhary, A., and Falsafi, B. 2001. Jetty: Filtering snoops for reduced energy consumption in smp servers. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA\u201901). 85--96."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1393921.1393988"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/186025.186041"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/301618.301645"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/379539.379553"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2006.21"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2007.7"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/1289816.1289823"},{"key":"e_1_2_1_45_1","unstructured":"Thoziyoor S. Muralimanohar N. Ahn J. and Jouppi N. 2008. Cacti 5.3. Tech. rep. HP Laboratories Palo Alto California. April. Thoziyoor S. Muralimanohar N. Ahn J. and Jouppi N. 2008. Cacti 5.3. Tech. rep. HP Laboratories Palo Alto California. April."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2005.50"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/996566.996753"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/1289816.1289876"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/859618.859635"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1870109.1870117","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1870109.1870117","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:59:47Z","timestamp":1750244387000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1870109.1870117"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,11]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,11]]}},"alternative-id":["10.1145\/1870109.1870117"],"URL":"https:\/\/doi.org\/10.1145\/1870109.1870117","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"type":"print","value":"1084-4309"},{"type":"electronic","value":"1557-7309"}],"subject":[],"published":{"date-parts":[[2010,11]]},"assertion":[{"value":"2009-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-11-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}