{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T02:30:07Z","timestamp":1767839407289,"version":"3.49.0"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2014,6,1]],"date-time":"2014-06-01T00:00:00Z","timestamp":1401580800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61103016, 61025009, and 61170045"],"award-info":[{"award-number":["61103016, 61025009, and 61170045"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004735","name":"Natural Science Foundation of Hunan Province","doi-asserted-by":"publisher","award":["12JJ4070"],"award-info":[{"award-number":["12JJ4070"]}],"id":[{"id":"10.13039\/501100004735","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Research Project","award":["JC120601"],"award-info":[{"award-number":["JC120601"]}]},{"DOI":"10.13039\/501100002338","name":"Ministry of Education of the People's Republic of China","doi-asserted-by":"publisher","award":["2.01E+13"],"award-info":[{"award-number":["2.01E+13"]}],"id":[{"id":"10.13039\/501100002338","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2014,6]]},"abstract":"<jats:p>Multicore architectures with Network-on-Chips (NoCs) have been widely recognized as the de facto design for the efficient utilization of the continuously increasing density of transistors on a chip. A key challenge in designing such an NoC-based multicore processor is maintaining cache coherence in an efficient manner. Directory-based protocols avoid the bandwidth overhead of snoop-based protocols, therefore scaling to a large number of cores. However, conventional directory structures add significant indirection delay to cache-to-cache accesses in larger multicore processor.<\/jats:p>\n          <jats:p>In this article we propose a novel hardware coherence technique, called integrated coherence prediction (ICP). This approach adopts a prediction technique for managing shared data to reduce or eliminate the cache-to-cache delay in coherence accesses. ICP has two unique features that differ from previous coherence prediction techniques. First, ICP introduces a new integrated prediction scheme that combines two kinds of predictors: owner predictor, which predicts the data writers and avoids the indirection through directory, and data predictor, which predicts the access address and prefetches data from remote nodes directly. Second, ICP uses a request replication method to reduce the negative effect of wrong owner prediction operations, thus facilitating overall performance improvement. We present the design and implementation details of the ICP approach. Using detailed full-system simulations, we conclude that the ICP provides a cost-effective solution for designing high-performance multicore processors.<\/jats:p>","DOI":"10.1145\/2611756","type":"journal-article","created":{"date-parts":[[2014,6,17]],"date-time":"2014-06-17T12:38:13Z","timestamp":1403008693000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Integrated Coherence Prediction"],"prefix":"10.1145","volume":"19","author":[{"given":"Libo","family":"Huang","sequence":"first","affiliation":[{"name":"National University of Defense Technology, Changsha, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhiying","family":"Wang","sequence":"additional","affiliation":[{"name":"National University of Defense Technology, Changsha, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nong","family":"Xiao","sequence":"additional","affiliation":[{"name":"National University of Defense Technology, Changsha, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yongwen","family":"Wang","sequence":"additional","affiliation":[{"name":"National University of Defense Technology, Changsha, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qiang","family":"Dou","sequence":"additional","affiliation":[{"name":"National University of Defense Technology, Changsha, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2014,6,23]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 3rd International Symposium on High-Performance Computer Architecture (HPCA'97)","author":"Abdel-Shafi Hazim","unstructured":"Hazim Abdel-Shafi , Jonathan Hall , Sarita V. Adve , and Vikram S. Adve . 1997. An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors . In Proceedings of the 3rd International Symposium on High-Performance Computer Architecture (HPCA'97) . 204--215. Hazim Abdel-Shafi, Jonathan Hall, Sarita V. Adve, and Vikram S. Adve. 1997. An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors. In Proceedings of the 3rd International Symposium on High-Performance Computer Architecture (HPCA'97). 204--215."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/762761.762762"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.5555\/645989.674321"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2007.370533"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2006.82"},{"key":"e_1_2_1_6_1","unstructured":"CACTI. 2013. An integrated cache and memory access time cycle time area leakage and dynamic power model. http:\/\/www.hpl.hp.com\/research\/cacti\/.  CACTI. 2013. An integrated cache and memory access time cycle time area leakage and dynamic power model. http:\/\/www.hpl.hp.com\/research\/cacti\/."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2007.346210"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2006.23"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2012.40"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the 8th International Conference on Computer Aided Verification (CAV'96)","author":"Dill David L.","year":"1996","unstructured":"David L. Dill . 1996 . The Mur \u03c6 verification system . In Proceedings of the 8th International Conference on Computer Aided Verification (CAV'96) . Springer, 390--393. David L. Dill. 1996. The Mur \u03c6 verification system. In Proceedings of the 8th International Conference on Computer Aided Verification (CAV'96). Springer, 390--393."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.27"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2008.4771777"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454138"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2206781.2206797"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2012.06.013"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1088149.1088154"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/520549.822760"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/325164.325162"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/1874620.1874721"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 5th International Symposium on High Performance Computer Architecture (HPCA'99)","author":"Kaxiras Stefanos","unstructured":"Stefanos Kaxiras and James R. Goodman . 1999. Improving CC-NUMA performance using instruction-based prediction . In Proceedings of the 5th International Symposium on High Performance Computer Architecture (HPCA'99) . IEEE Computer Society, 161. Stefanos Kaxiras and James R. Goodman. 1999. Improving CC-NUMA performance using instruction-based prediction. In Proceedings of the 5th International Symposium on High Performance Computer Architecture (HPCA'99). IEEE Computer Society, 161."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2010.82"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 6th International Symposium on High Performance Computer Architecture (HPCA'00)","author":"Kaxiras Stefanos","year":"2000","unstructured":"Stefanos Kaxiras and Cliff Young . 2000 . Coherence communication prediction in shared-memory multiprocessors . In Proceedings of the 6th International Symposium on High Performance Computer Architecture (HPCA'00) . 156. Stefanos Kaxiras and Cliff Young. 2000. Coherence communication prediction in shared-memory multiprocessors. In Proceedings of the 6th International Symposium on High Performance Computer Architecture (HPCA'00). 156."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1882453.1882458"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.553274"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485967"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/191995.192056"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/300979.300994"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/264107.264206"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.121510"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2006.4380808"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1049\/iet-cdt.2012.0056"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/232973.233006"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/859618.859642"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the 5th International Symposium on High Performance Computer Architecture (HPCA'99)","author":"Maged","unstructured":"Maged M. Michael and Ashwini K. Nanda. 1999. Design and performance of directory caches for scalable shared memory multiprocessors . In Proceedings of the 5th International Symposium on High Performance Computer Architecture (HPCA'99) . IEEE Computer Society, 142. Maged M. Michael and Ashwini K. Nanda. 1999. Design and performance of directory caches for scalable shared memory multiprocessors. In Proceedings of the 5th International Symposium on High Performance Computer Architecture (HPCA'99). IEEE Computer Society, 142."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/279358.279386"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/191995.192014"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the IEEE Symposium on Parallel and Distributed Processing (IPDPS'08)","author":"Ros Alberto","unstructured":"Alberto Ros , Manuel E. Acacio , and Jose M. Garcia . 2008. DiCo-CMP: Efficient cache coherency in tiled CMP architectures . In Proceedings of the IEEE Symposium on Parallel and Distributed Processing (IPDPS'08) . 1--11. Alberto Ros, Manuel E. Acacio, and Jose M. Garcia. 2008. DiCo-CMP: Efficient cache coherency in tiled CMP architectures. In Proceedings of the IEEE Symposium on Parallel and Distributed Processing (IPDPS'08). 1--11."},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the 6th International Symposium on High Performance Computer Architecture (HPCA'00)","author":"Stets Robert","unstructured":"Robert Stets , Sandhya Dwarkadas , Leonidas I. Kontothanassis , Umit Rencuzogullari , and Michael L. Scott . 2000. The effect of network total order, broadcast, and remote-write capability on network-based shared memory computing . In Proceedings of the 6th International Symposium on High Performance Computer Architecture (HPCA'00) . IEEE Computer Society, 265--276. Robert Stets, Sandhya Dwarkadas, Leonidas I. Kontothanassis, Umit Rencuzogullari, and Michael L. Scott. 2000. The effect of network total order, broadcast, and remote-write capability on network-based shared memory computing. In Proceedings of the 6th International Symposium on High Performance Computer Architecture (HPCA'00). IEEE Computer Society, 265--276."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2005.37"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2007.89"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/223982.223990"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/139669.139709"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2611756","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2611756","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:01:33Z","timestamp":1750230093000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2611756"}},"subtitle":["Towards Efficient Cache Coherence on NoC-Based Multicore Architectures"],"short-title":[],"issued":{"date-parts":[[2014,6]]},"references-count":42,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2014,6]]}},"alternative-id":["10.1145\/2611756"],"URL":"https:\/\/doi.org\/10.1145\/2611756","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"value":"1084-4309","type":"print"},{"value":"1557-7309","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,6]]},"assertion":[{"value":"2013-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-06-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}