{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T20:55:10Z","timestamp":1771707310839,"version":"3.50.1"},"reference-count":22,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2023,11,30]],"date-time":"2023-11-30T00:00:00Z","timestamp":1701302400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2023,11,30]],"date-time":"2023-11-30T00:00:00Z","timestamp":1701302400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J. Comput. Sci. Technol."],"published-print":{"date-parts":[[2023,12]]},"DOI":"10.1007\/s11390-021-1251-x","type":"journal-article","created":{"date-parts":[[2024,1,31]],"date-time":"2024-01-31T13:02:41Z","timestamp":1706706161000},"page":"1323-1338","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["wrBench: Comparing Cache Architectures and Coherency Protocols on ARMv8 Many-Core Systems"],"prefix":"10.1007","volume":"38","author":[{"given":"Wan-Rong","family":"Gao","sequence":"first","affiliation":[]},{"given":"Jian-Bin","family":"Fang","sequence":"additional","affiliation":[]},{"given":"Chun","family":"Huang","sequence":"additional","affiliation":[]},{"given":"Chuan-Fu","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Zheng","family":"Wang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,11,30]]},"reference":[{"key":"1251_CR1","doi-asserted-by":"publisher","unstructured":"Laurenzano M A, Tiwari A, Cauble-Chantrenne A, Jundt A, Ward W A, Campbell R, Carrington L. Characterization and bottleneck analysis of a 64-bit ARMv8 platform. In Proc. the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, April 2016, pp.36\u201345. https:\/\/doi.org\/10.1109\/ISPASS.2016.7482072.","DOI":"10.1109\/ISPASS.2016.7482072"},{"key":"1251_CR2","doi-asserted-by":"publisher","unstructured":"Stephens N. ARMv8-A next-generation vector architecture for HPC. In Proc. the 2016 IEEE Hot Chips 28 Symposium, Aug. 2016. https:\/\/doi.org\/10.1109\/HOTCHIPS.2016.7936203.","DOI":"10.1109\/HOTCHIPS.2016.7936203"},{"key":"1251_CR3","doi-asserted-by":"publisher","unstructured":"Zhang C. Mars: A 64-core ARMv8 processor. In Proc. the 2015 IEEE Hot Chips 27 Symposium, Aug. 2015. https:\/\/doi.org\/10.1109\/HOTCHIPS.2015.7477454.","DOI":"10.1109\/HOTCHIPS.2015.7477454"},{"key":"1251_CR4","doi-asserted-by":"publisher","unstructured":"Arima E, Kodama Y, Odajima T, Tsuji M, Sato M. Power\/Performance\/Area evaluations for next-generation HPC processors using the A64FX chip. In Proc. the 2021 IEEE Symposium in Low-Power and High-Speed Chips, Apr. 2021. https:\/\/doi.org\/10.1109\/COOLCHIPS52128.2021.9410320.","DOI":"10.1109\/COOLCHIPS52128.2021.9410320"},{"key":"1251_CR5","doi-asserted-by":"publisher","unstructured":"Odajima T, Kodama Y, Tsuji M, Matsuda M, Maruyama Y, Sato M. Preliminary performance evaluation of the Fujitsu A64FX using HPC applications. In Proc. the 2020 IEEE International Conference on Cluster Computing, Sept. 2020, pp.523\u2013530. https:\/\/doi.org\/10.1109\/CLUSTER49012.2020.00075.","DOI":"10.1109\/CLUSTER49012.2020.00075"},{"key":"1251_CR6","doi-asserted-by":"publisher","unstructured":"Pedretti K T, Younge A J, Hammond S D, Laros III J H, Curry M L, Aguilar M J, Hoekstra R J, Brightwell R. Chronicles of Astra: Challenges and lessons from the first Petascale Arm supercomputer. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2020. https:\/\/doi.org\/10.1109\/SC41405.2020.00052.","DOI":"10.1109\/SC41405.2020.00052"},{"key":"1251_CR7","doi-asserted-by":"publisher","first-page":"800","DOI":"10.1016\/j.future.2020.06.033","volume":"112","author":"F Mantovani","year":"2020","unstructured":"Mantovani F, Garcia-Gasulla M, Gracia J, Stafford E, Banchelli F, Josep-Fabrego M, Criado-Ledesma J, Nachtmann M. Performance and energy consumption of HPC workloads on a cluster based on Arm ThunderX2 CPU. Future Gener. Comput. Syst., 2020, 112: 800\u2013818. https:\/\/doi.org\/10.1016\/j.future.2020.06.033.","journal-title":"Future Gener. Comput. Syst."},{"issue":"7","key":"1251_CR8","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1109\/MC.2008.209","volume":"41","author":"MD Hill","year":"2008","unstructured":"Hill M D, Marty M R. Amdahl\u2019s law in the multicore era. IEEE Computer, 2008, 41(7): 33\u201338. https:\/\/doi.org\/10.1109\/MC.2008.209.","journal-title":"IEEE Computer"},{"key":"1251_CR9","unstructured":"McCalpin J D. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture Newsletter, 1995, 2: 19\u201325."},{"key":"1251_CR10","unstructured":"McVoy L M, Staelin C. lmbench: Portable tools for performance analysis. In Proc. the USENIX Annual Technical Conference, Jan. 1996, pp.279\u2013294."},{"key":"1251_CR11","doi-asserted-by":"publisher","unstructured":"Molka D, Hackenberg D, Sch\u00f6ne R, M\u00fcller M S. Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system. In Proc. the 18th International Conference on Parallel Architectures and Compilation Techniques, Sept. 2009, pp.261\u2013270. https:\/\/doi.org\/10.1109\/PACT.2009.22.","DOI":"10.1109\/PACT.2009.22"},{"key":"1251_CR12","doi-asserted-by":"publisher","unstructured":"Ramos S, Hoefler T. Modeling communication in cache-coherent SMP systems: A case-study with Xeon Phi. In Proc. the 22nd International Symposium on High-Performance Parallel and Distributed Computing, Jun. 2013, pp.97\u2013108. https:\/\/doi.org\/10.1145\/2493123.2462916.","DOI":"10.1145\/2493123.2462916"},{"key":"1251_CR13","doi-asserted-by":"publisher","unstructured":"Fang J, Sips H J, Zhang L, Xu C, Che Y, Varbanescu A L. Test-driving Intel Xeon Phi. In Proc. the ACM\/SPEC International Conference on Performance Engineering, Mar. 2014, pp.137\u2013148. https:\/\/doi.org\/10.1145\/2568088.2576799.","DOI":"10.1145\/2568088.2576799"},{"issue":"1","key":"1251_CR14","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1007\/s11390-020-0741-6","volume":"36","author":"J Fang","year":"2021","unstructured":"Fang J, Liao X, Huang C, Dong D. Performance evaluation of memory-centric ARMv8 many-core architectures: A case study with Phytium 2000+. Journal of Computer Science and Technology, 2021, 36(1): 33\u201343. https:\/\/doi.org\/10.1007\/s11390-020-0741-6.","journal-title":"Journal of Computer Science and Technology"},{"issue":"5","key":"1251_CR15","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1109\/MM.2021.3085578","volume":"41","author":"J Xia","year":"2021","unstructured":"Xia J, Cheng C, Zhou X, Hu Y, Chun P. Kunpeng 920: The first 7-nm chiplet-based 64-Core ARM SoC for cloud services. IEEE Micro, 2021, 41(5): 67\u201375. https:\/\/doi.org\/10.1109\/MM.2021.3085578.","journal-title":"IEEE Micro"},{"key":"1251_CR16","doi-asserted-by":"publisher","unstructured":"Hackenberg D, Molka D, Nagel W E. Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems. In Proc. the 42nd Annual IEEE\/ACM International Symposium on Microarchitecture, Dec. 2009, pp.413\u2013422. https:\/\/doi.org\/10.1145\/1669112.1669165.","DOI":"10.1145\/1669112.1669165"},{"key":"1251_CR17","doi-asserted-by":"publisher","unstructured":"Ballard G, Druinsky A, Knight N, Schwartz O. Hypergraph partitioning for sparse matrix-matrix multiplication. ACM Trans. Parallel Comput. 2016, 3(3): Article 18. https:\/\/doi.org\/10.1145\/3015144.","DOI":"10.1145\/3015144"},{"key":"1251_CR18","doi-asserted-by":"publisher","unstructured":"Babka V, T\u016fma P. Investigating cache parameters of x86 family processors. In Proc. the SPEC Benchmark Workshop, Jan. 2009, pp.77\u201396. https:\/\/doi.org\/10.1007\/978-3-540-93799-9_5.","DOI":"10.1007\/978-3-540-93799-9_5"},{"key":"1251_CR19","doi-asserted-by":"publisher","unstructured":"Wong H, Papadopoulou M, Sadooghi-Alvandi M. Demystifying GPU microarchitecture through microbenchmarking. In Proc. the 2010 IEEE International Symposium on Performance Analysis of Systems Software, Mar. 2010, pp.235\u2013246. https:\/\/doi.org\/10.1109\/ISPASS.2010.5452013.","DOI":"10.1109\/ISPASS.2010.5452013"},{"issue":"1","key":"1251_CR20","doi-asserted-by":"publisher","first-page":"72","DOI":"10.1109\/TPDS.2016.2549523","volume":"28","author":"X Mei","year":"2017","unstructured":"Mei X, Chu X. Dissecting GPU memory hierarchy through microbenchmarking. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(1): 72\u201386. https:\/\/doi.org\/10.1109\/TPDS.2016.2549523.","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"key":"1251_CR21","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1016\/j.parco.2018.06.001","volume":"77","author":"J Lin","year":"2018","unstructured":"Lin J, Xu Z, Cai L, Nukada A, Matsuoka S. Evaluating the SW26010 many-core processor with a micro-benchmark suite for performance optimizations. Parallel Computing, 2018, 77: 128\u2013143. https:\/\/doi.org\/10.1016\/j.parco.2018.06.001.","journal-title":"Parallel Computing"},{"key":"1251_CR22","doi-asserted-by":"publisher","unstructured":"McIntosh-Smith S, Price J, Deakin T, Poenaru A. A performance analysis of the first generation of HPC-optimized Arm processors. Concurrency and Computation: Practice and Experience, 2019, 31(16): e5110. https:\/\/doi.org\/10.1002\/cpe.5110.","DOI":"10.1002\/cpe.5110"}],"container-title":["Journal of Computer Science and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11390-021-1251-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11390-021-1251-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11390-021-1251-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,31]],"date-time":"2024-01-31T13:20:55Z","timestamp":1706707255000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11390-021-1251-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,30]]},"references-count":22,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,12]]}},"alternative-id":["1251"],"URL":"https:\/\/doi.org\/10.1007\/s11390-021-1251-x","relation":{},"ISSN":["1000-9000","1860-4749"],"issn-type":[{"value":"1000-9000","type":"print"},{"value":"1860-4749","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,30]]},"assertion":[{"value":"31 December 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 November 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 November 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}