{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T20:55:11Z","timestamp":1771707311276,"version":"3.50.1"},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,1,1]],"date-time":"2021-01-01T00:00:00Z","timestamp":1609459200000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"},{"start":{"date-parts":[[2021,1,1]],"date-time":"2021-01-01T00:00:00Z","timestamp":1609459200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J. Comput. Sci. Technol."],"published-print":{"date-parts":[[2021,1]]},"DOI":"10.1007\/s11390-020-0741-6","type":"journal-article","created":{"date-parts":[[2021,2,9]],"date-time":"2021-02-09T01:58:18Z","timestamp":1612835898000},"page":"33-43","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["Performance Evaluation of Memory-Centric ARMv8 Many-Core Architectures: A Case Study with Phytium 2000+"],"prefix":"10.1007","volume":"36","author":[{"given":"Jian-Bin","family":"Fang","sequence":"first","affiliation":[]},{"given":"Xiang-Ke","family":"Liao","sequence":"additional","affiliation":[]},{"given":"Chun","family":"Huang","sequence":"additional","affiliation":[]},{"given":"De-Zun","family":"Dong","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,1,30]]},"reference":[{"key":"741_CR1","doi-asserted-by":"publisher","unstructured":"Laurenzano M A, Tiwari A, Cauble-Chantrenne A et al. Characterization and bottleneck analysis of a 64-bit ARMv8 platform. In Proc. the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, April 2016, pp.36-45. https:\/\/doi.org\/10.1109\/ISPASS.2016.7482072.","DOI":"10.1109\/ISPASS.2016.7482072"},{"key":"741_CR2","doi-asserted-by":"publisher","unstructured":"Stephens N. ARMv8-a next-generation vector architecture for HPC. In Proc. the 2016 IEEE Hot Chips 28 Symposium, August 2016. https:\/\/doi.org\/10.1109\/HOTCHIPS.2016.7936203.","DOI":"10.1109\/HOTCHIPS.2016.7936203"},{"key":"741_CR3","doi-asserted-by":"publisher","unstructured":"Zhang C. Mars: A 64-core ARMv8 processor. In Proc. the 2015 IEEE Hot Chips 27 Symposium, Aug. 2015. https:\/\/doi.org\/10.1109\/HOTCHIPS.2015.7477454.","DOI":"10.1109\/HOTCHIPS.2015.7477454"},{"key":"741_CR4","doi-asserted-by":"publisher","unstructured":"You X, Yang H, Luan Z, Liu Y, Qian D. Performance evaluation and analysis of linear algebra kernels in the prototype Tianhe-3 cluster. In Proc. the 5th Asian Conference on Supercomputing Frontiers, March 2019, pp.86-105. https:\/\/doi.org\/10.1007\/978-3-030-18645-6_6.","DOI":"10.1007\/978-3-030-18645-6_6"},{"key":"741_CR5","unstructured":"Dongarra J. Report on the Fujitsu Fugaku system.Technical Report, University of Tennessee, 2020. https:\/\/www.icl.utk.edu\/files\/publications\/2020\/icl-utk-1379-2020.pdf, Nov. 2020."},{"key":"741_CR6","doi-asserted-by":"publisher","unstructured":"Molka D, Hackenberg D, Sch\u00f6ne R, M\u00fcller M S. Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system. In Proc. the 18th International Conference on Parallel Architectures and Compilation Techniques, September 2009, pp.261-270. https:\/\/doi.org\/10.1109\/PACT.2009.22.","DOI":"10.1109\/PACT.2009.22"},{"key":"741_CR7","unstructured":"McCalpin J. Memory bandwidth and machine balance in current high performance computers. https:\/\/www.cs.virginia.edu\/stream\/analyses.html, Dec. 2020."},{"key":"741_CR8","doi-asserted-by":"publisher","unstructured":"Kamil S, Husbands P, Oliker L, Shalf J, Yelick K A. Impact of modern memory subsystems on cache optimizations for stencil computations. In Proc. the 2005 Workshop on Memory System Performance, June 2005, pp.36-43. https:\/\/doi.org\/10.1145\/1111583.1111589.","DOI":"10.1145\/1111583.1111589"},{"issue":"4","key":"741_CR9","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1145\/1498765.1498785","volume":"52","author":"S Williams","year":"2009","unstructured":"Williams S, Waterman A, Patterson D A. Rooine: An insightful visual performance model for multicore architectures. Commun. ACM, 2009, 52(4): 65-76. https:\/\/doi.org\/10.1145\/1498765.1498785","journal-title":"Commun. ACM"},{"issue":"1","key":"741_CR10","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1109\/L-CA.2013.6","volume":"13","author":"A Ilic","year":"2014","unstructured":"Ilic A, Pratas F, Sousa L. Cache-aware rooine model: Upgrading the loft. IEEE Comput. Archit. Lett., 2014, 13(1): 21-24. https:\/\/doi.org\/10.1109\/L-CA.2013.6.","journal-title":"IEEE Comput. Archit. Lett."},{"key":"741_CR11","doi-asserted-by":"publisher","unstructured":"Liu X, Buono D, Checconi F, Choi J W, Que X, Petrini F, Gunnels J A, Stuecheli J. An early performance study of large-scale POWER8 SMP systems. In Proc. the 2016 IEEE International Parallel and Distributed Processing Symposium, May 2016, pp.263-272. https:\/\/doi.org\/10.1109\/IPDPS.2016.14.","DOI":"10.1109\/IPDPS.2016.14"},{"key":"741_CR12","doi-asserted-by":"publisher","unstructured":"Goto K, van de Geijn R A. Anatomy of high performance matrix multiplication. ACM Trans. Math. Softw., 2008, 34(3): Article No. 12. https:\/\/doi.org\/10.1145\/1356052.1356053.","DOI":"10.1145\/1356052.1356053"},{"key":"741_CR13","doi-asserted-by":"publisher","unstructured":"Frison G, Kouzoupis D, Sartor T, Zanelli A, Diehl M. BLASFEO: Basic linear algebra subroutines for embedded optimization. ACM Trans. Math. Softw., 2018, 44(4): Article No. 42. https:\/\/doi.org\/10.1145\/3210754.","DOI":"10.1145\/3210754"},{"key":"741_CR14","doi-asserted-by":"publisher","unstructured":"Su X, Liao X, Jiang H, Yang C, Xue J. SCP: Shared cache partitioning for high-performance GEMM. ACM Transactions on Architecture and Code Optimization, 2019, 15(4): Article No. 43. https:\/\/doi.org\/10.1145\/3274654.","DOI":"10.1145\/3274654"},{"key":"741_CR15","doi-asserted-by":"publisher","unstructured":"Hollowell C, Caramarcu C, Strecker-Kellogg W, Wong A, Zaytsev A. The effect of NUMA tunings on CPU performance. Journal of Physics: Conference Series, 2015, 664(9): Article No. 092010. https:\/\/doi.org\/10.1088\/1742-6596\/664\/9\/092010.","DOI":"10.1088\/1742-6596\/664\/9\/092010"},{"key":"741_CR16","doi-asserted-by":"publisher","unstructured":"Liu W, Vinter B. CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In Proc. the 29th ACM on International Conference on Supercomputing, June 2015, pp.339-350. https:\/\/doi.org\/10.1145\/2751205.2751209.","DOI":"10.1145\/2751205.2751209"},{"key":"741_CR17","unstructured":"Grimes R, Kincaid D, Young D. ITPACK 2.0 user\u2019s guide. Technical Report, Center for Numerical Analysis, University of Texas, 1979."},{"issue":"5","key":"741_CR18","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1137\/130930352","volume":"36","author":"M Kreutzer","year":"2014","unstructured":"Kreutzer M, Hager G, Wellein G, Fehske H, Bishop A R. A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM J. Sci. Comput., 2014, 36(5): 401-423. https:\/\/doi.org\/10.1137\/130930352.","journal-title":"SIAM J. Sci. Comput."},{"key":"741_CR19","doi-asserted-by":"publisher","unstructured":"Bell N, Garland M. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proc. the ACM\/IEEE Conference on High Performance Computing, November 2009. https:\/\/doi.org\/10.1145\/1654059.1654078.","DOI":"10.1145\/1654059.1654078"},{"issue":"1","key":"741_CR20","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1007\/s10766-019-00646-x","volume":"48","author":"D Chen","year":"2020","unstructured":"Chen D, Fang J, Xu C, Chen S, Wang Z. Characterizing scalability of sparse matrix-vector multiplications on Phytium FT-2000+. Int. J. Parallel Program., 2020, 48(1): 80-97. https:\/\/doi.org\/10.1007\/s10766-019-00646-x.","journal-title":"Int. J. Parallel Program."},{"issue":"3","key":"741_CR21","doi-asserted-by":"publisher","first-page":"418","DOI":"10.1007\/s10766-018-00625-8","volume":"47","author":"D Chen","year":"2019","unstructured":"Chen D, Fang J, Chen S, Xu C, Wang Z. Optimizing sparse matrix-vector multiplications on an ARMv8-based many-core architecture. Int. J. Parallel Program., 2019, 47(3): 418-432. https:\/\/doi.org\/10.1007\/s10766-018-00625-8.","journal-title":"Int. J. Parallel Program."},{"key":"741_CR22","doi-asserted-by":"publisher","unstructured":"Chen S, Fang J, Chen D, Xu C, Wang Z. Adaptive optimization of sparse matrix-vector multiplication on emerging many-core architectures. In Proc. the 20th IEEE International Conference on High Performance Computing, June 2018, pp.649-658. https:\/\/doi.org\/10.1109\/HPCC\/SmartCity\/DSS.2018.00116.","DOI":"10.1109\/HPCC\/SmartCity\/DSS.2018.00116"},{"key":"741_CR23","doi-asserted-by":"publisher","unstructured":"Babka V, Tuma P. Investigating cache parameters of x86 family processors. In Proc. the 2009 SPEC Benchmark Workshop, January 2009, pp.77-96. https:\/\/doi.org\/10.1007\/978-3-540-93799-9_5.","DOI":"10.1007\/978-3-540-93799-9_5"},{"key":"741_CR24","doi-asserted-by":"publisher","unstructured":"Fang J, Sips H J, Zhang L, Xu C, Che Y, Varbanescu A L. Test-driving Intel Xeon Phi. In Proc. the 5th ACM\/SPEC International Conference on Performance Engineering, March 2014, pp.137-148. https:\/\/doi.org\/10.1145\/2568088.2576799.","DOI":"10.1145\/2568088.2576799"},{"key":"741_CR25","doi-asserted-by":"publisher","unstructured":"Ramos S, Hoeer T. Modeling communication in cache-coherent SMP systems: A case-study with Xeon Phi. In Proc. the 22nd International Symposium on High-Performance Parallel and Distributed Computing, June 2013, pp.97-108. https:\/\/doi.org\/10.1145\/2462902.2462916.","DOI":"10.1145\/2462902.2462916"}],"container-title":["Journal of Computer Science and Technology"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11390-020-0741-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s11390-020-0741-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11390-020-0741-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,2,9]],"date-time":"2021-02-09T01:59:09Z","timestamp":1612835949000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s11390-020-0741-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1]]},"references-count":25,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["741"],"URL":"https:\/\/doi.org\/10.1007\/s11390-020-0741-6","relation":{},"ISSN":["1000-9000","1860-4749"],"issn-type":[{"value":"1000-9000","type":"print"},{"value":"1860-4749","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1]]},"assertion":[{"value":"24 June 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 December 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 January 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}