{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:16:26Z","timestamp":1750220186178,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":29,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,4,9]],"date-time":"2022-04-09T00:00:00Z","timestamp":1649462400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,4,9]]},"DOI":"10.1145\/3489525.3511673","type":"proceedings-article","created":{"date-parts":[[2022,3,25]],"date-time":"2022-03-25T22:11:46Z","timestamp":1648246306000},"page":"77-88","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Isolating GPU Architectural Features Using Parallelism-Aware Microbenchmarks"],"prefix":"10.1145","author":[{"given":"Rico","family":"van Stigt","sequence":"first","affiliation":[{"name":"University of Amsterdam, Amsterdam, Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stephen Nicholas","family":"Swatman","sequence":"additional","affiliation":[{"name":"University of Amsterdam, Amsterdam, Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ana-Lucia","family":"Varbanescu","sequence":"additional","affiliation":[{"name":"University of Amsterdam, Amsterdam, Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,4,9]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS.2017.00017"},{"key":"e_1_3_2_1_2_1","unstructured":"Rob Armstrong Arthy Sundaram and Fred Oh. 2021. Revealing New Features in the CUDA 11.5 Toolkit . NVIDIA corporation. https:\/\/developer.nvidia.com\/blog\/revealing-new-features-in-the-cuda-11--5-toolkit\/  Rob Armstrong Arthy Sundaram and Fred Oh. 2021. Revealing New Features in the CUDA 11.5 Toolkit . NVIDIA corporation. https:\/\/developer.nvidia.com\/blog\/revealing-new-features-in-the-cuda-11--5-toolkit\/"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/506106.506115"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/SIES.2016.7509423"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3381039"},{"volume-title":"Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units","author":"Danalis Anthony","key":"e_1_3_2_1_7_1","unstructured":"Anthony Danalis , Gabriel Marin , Collin McCurdy , Jeremy S. Meredith , Philip C. Roth , Kyle Spafford , Vinod Tipparaju , and Jeffrey S. Vetter . 2010. The Scalable Heterogeneous Computing (SHOC) Benchmark Suite . In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units ( Pittsburgh, Pennsylvania, USA) (GPGPU-3). Association for Computing Machinery, New York, NY, USA, 63--74. https:\/\/doi.org\/10.1145\/1735688.1735702 10.1145\/1735688.1735702 Anthony Danalis, Gabriel Marin, Collin McCurdy, Jeremy S. Meredith, Philip C. Roth, Kyle Spafford, Vinod Tipparaju, and Jeffrey S. Vetter. 2010. The Scalable Heterogeneous Computing (SHOC) Benchmark Suite. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (Pittsburgh, Pennsylvania, USA) (GPGPU-3). Association for Computing Machinery, New York, NY, USA, 63--74. https:\/\/doi.org\/10.1145\/1735688.1735702"},{"volume-title":"High Performance Computing , , Michela Taufer","author":"Deakin Tom","key":"e_1_3_2_1_8_1","unstructured":"Tom Deakin , James Price , Matt Martineau , and Simon McIntosh-Smith . 2016. GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models . In High Performance Computing , , Michela Taufer , Bernd Mohr , and Julian M. Kunkel (Eds.). Springer International Publishing , Cham, 489--507. Tom Deakin, James Price, Matt Martineau, and Simon McIntosh-Smith. 2016. GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models. In High Performance Computing , , Michela Taufer, Bernd Mohr, and Julian M. Kunkel (Eds.). Springer International Publishing, Cham, 489--507."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2010.5470378"},{"key":"e_1_3_2_1_10_1","unstructured":"Mark Harris. 2016. Mixed-Precision Programming with CUDA 8. https:\/\/developer.nvidia.com\/blog\/mixed-precision-programming-cuda-8\/  Mark Harris. 2016. Mixed-Precision Programming with CUDA 8. https:\/\/developer.nvidia.com\/blog\/mixed-precision-programming-cuda-8\/"},{"key":"e_1_3_2_1_11_1","volume-title":"Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics. In 2006 IEEE International Symposium on Workload Characterization. IEEE","author":"Hoste Kenneth","year":"2006","unstructured":"Kenneth Hoste and Lieven Eeckhout . 2006 . Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics. In 2006 IEEE International Symposium on Workload Characterization. IEEE , San Jose, California, United States of America, 83--92. https:\/\/doi.org\/10.1109\/iiswc. 2006.302732 10.1109\/iiswc.2006.302732 Kenneth Hoste and Lieven Eeckhout. 2006. Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics. In 2006 IEEE International Symposium on Workload Characterization. IEEE, San Jose, California, United States of America, 83--92. https:\/\/doi.org\/10.1109\/iiswc.2006.302732"},{"key":"e_1_3_2_1_12_1","volume-title":"Dissecting the NVidia Turing T4 GPU via Microbenchmarking. arxiv","author":"Jia Zhe","year":"1903","unstructured":"Zhe Jia , Marco Maggioni , Jeffrey Smith , and Daniele Paolo Scarpazza . 2019. Dissecting the NVidia Turing T4 GPU via Microbenchmarking. arxiv : 1903 .07486 [cs.DC] Zhe Jia, Marco Maggioni, Jeffrey Smith, and Daniele Paolo Scarpazza. 2019. Dissecting the NVidia Turing T4 GPU via Microbenchmarking. arxiv: 1903.07486 [cs.DC]"},{"key":"e_1_3_2_1_13_1","volume-title":"Scarpazza","author":"Jia Zhe","year":"2018","unstructured":"Zhe Jia , Marco Maggioni , Benjamin Staiger , and Daniele P . Scarpazza . 2018 . Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking . arxiv: 1804.06826 [cs.DC] Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele P. Scarpazza. 2018. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. arxiv: 1804.06826 [cs.DC]"},{"key":"e_1_3_2_1_14_1","volume-title":"SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance. In High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation , , Stephen A","author":"Juckeland Guido","year":"2015","unstructured":"Guido Juckeland , William Brantley , Sunita Chandrasekaran , Barbara Chapman , Shuai Che , Mathew Colgrove , Huiyu Feng , Alexander Grund , Robert Henschel , Wen-Mei W. Hwu , Huian Li , Matthias S. M\u00fcller , Wolfgang E. Nagel , Maxim Perminov , Pavel Shelepugin , Kevin Skadron , John Stratton , Alexey Titov , Ke Wang , Matthijs van Waveren , Brian Whitney , Sandra Wienke , Rengan Xu , and Kalyan Kumaran . 2015 . SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance. In High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation , , Stephen A . Jarvis, Steven A. Wright, and Simon D. Hammond (Eds.). Springer International Publishing , Cham , 46--67. Guido Juckeland, William Brantley, Sunita Chandrasekaran, Barbara Chapman, Shuai Che, Mathew Colgrove, Huiyu Feng, Alexander Grund, Robert Henschel, Wen-Mei W. Hwu, Huian Li, Matthias S. M\u00fcller, Wolfgang E. Nagel, Maxim Perminov, Pavel Shelepugin, Kevin Skadron, John Stratton, Alexey Titov, Ke Wang, Matthijs van Waveren, Brian Whitney, Sandra Wienke, Rengan Xu, and Kalyan Kumaran. 2015. SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance. In High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation , , Stephen A. Jarvis, Steven A. Wright, and Simon D. Hammond (Eds.). Springer International Publishing, Cham, 46--67."},{"key":"e_1_3_2_1_15_1","volume-title":"Performance & Precision. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) . IEEE","author":"Markidis Stefano","year":"2018","unstructured":"Stefano Markidis , Steven Wei Der Chien , Erwin Laure , Ivy Bo Peng , and Jeffrey S. Vetter . 2018. NVIDIA Tensor Core Programmability , Performance & Precision. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) . IEEE , Vancouver, British Columbia, Canada, 522--531. https:\/\/doi.org\/10.1109\/ipdpsw. 2018 .00091 10.1109\/ipdpsw.2018.00091 Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jeffrey S. Vetter. 2018. NVIDIA Tensor Core Programmability, Performance & Precision. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) . IEEE, Vancouver, British Columbia, Canada, 522--531. https:\/\/doi.org\/10.1109\/ipdpsw.2018.00091"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/tpds.2016.2549523"},{"key":"e_1_3_2_1_17_1","volume-title":"Benchmarking the Memory Hierarchy of Modern GPUs. In 2014 IFIP International Conference on Network and Parallel Computing. Springer Berlin Heidelberg","author":"Mei Xinxin","year":"2014","unstructured":"Xinxin Mei , Kaiyong Zhao , Chengjian Liu , and Xiaowen Chu . 2014 . Benchmarking the Memory Hierarchy of Modern GPUs. In 2014 IFIP International Conference on Network and Parallel Computing. Springer Berlin Heidelberg , Yilan, Taiwan, 144--156. https:\/\/doi.org\/10.1007\/978--3--662--44917--2_13 10.1007\/978--3--662--44917--2_13 Xinxin Mei, Kaiyong Zhao, Chengjian Liu, and Xiaowen Chu. 2014. Benchmarking the Memory Hierarchy of Modern GPUs. In 2014 IFIP International Conference on Network and Parallel Computing. Springer Berlin Heidelberg, Yilan, Taiwan, 144--156. https:\/\/doi.org\/10.1007\/978--3--662--44917--2_13"},{"key":"e_1_3_2_1_18_1","unstructured":"NVIDIA corporation. 2016a. Geforce GTX 1080 Whitepaper . NVIDIA corporation. http:\/\/international.download.nvidia.com\/geforce-com\/international\/pdfs\/GeForce_GTX_1080_Whitepaper_FINAL.pdf  NVIDIA corporation. 2016a. Geforce GTX 1080 Whitepaper . NVIDIA corporation. http:\/\/international.download.nvidia.com\/geforce-com\/international\/pdfs\/GeForce_GTX_1080_Whitepaper_FINAL.pdf"},{"key":"e_1_3_2_1_19_1","unstructured":"NVIDIA corporation. 2016b. Nvidia Pascal Tuning Guide . NVIDIA corporation. https:\/\/docs.nvidia.com\/cuda\/pascal-tuning-guide\/index.html  NVIDIA corporation. 2016b. Nvidia Pascal Tuning Guide . NVIDIA corporation. https:\/\/docs.nvidia.com\/cuda\/pascal-tuning-guide\/index.html"},{"key":"e_1_3_2_1_20_1","unstructured":"NVIDIA corporation. 2016c. Nvidia Tesla P100 Whitepaper . NVIDIA corporation. https:\/\/images.nvidia.com\/content\/pdf\/tesla\/whitepaper\/pascal-architecture-whitepaper.pdf  NVIDIA corporation. 2016c. Nvidia Tesla P100 Whitepaper . NVIDIA corporation. https:\/\/images.nvidia.com\/content\/pdf\/tesla\/whitepaper\/pascal-architecture-whitepaper.pdf"},{"key":"e_1_3_2_1_21_1","unstructured":"NVIDIA corporation. 2018. Nvidia Turing GPU Architecture . NVIDIA corporation. https:\/\/images.nvidia.com\/aem-dam\/en-zz\/Solutions\/design-visualization\/technologies\/turing-architecture\/NVIDIA-Turing-Architecture-Whitepaper.pdf  NVIDIA corporation. 2018. Nvidia Turing GPU Architecture . NVIDIA corporation. https:\/\/images.nvidia.com\/aem-dam\/en-zz\/Solutions\/design-visualization\/technologies\/turing-architecture\/NVIDIA-Turing-Architecture-Whitepaper.pdf"},{"key":"e_1_3_2_1_22_1","unstructured":"NVIDIA corporation. 2020 a. Nvidia A100 Tensor Core GPU Architecture . NVIDIA corporation. https:\/\/images.nvidia.com\/aem-dam\/en-zz\/Solutions\/data-center\/nvidia-ampere-architecture-whitepaper.pdf  NVIDIA corporation. 2020 a. Nvidia A100 Tensor Core GPU Architecture . NVIDIA corporation. https:\/\/images.nvidia.com\/aem-dam\/en-zz\/Solutions\/data-center\/nvidia-ampere-architecture-whitepaper.pdf"},{"key":"e_1_3_2_1_23_1","unstructured":"NVIDIA corporation. 2020 b. Nvidia Ampere GA102 GPU Architecture . NVIDIA corporation. https:\/\/www.nvidia.com\/content\/PDF\/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf  NVIDIA corporation. 2020 b. Nvidia Ampere GA102 GPU Architecture . NVIDIA corporation. https:\/\/www.nvidia.com\/content\/PDF\/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf"},{"key":"e_1_3_2_1_24_1","unstructured":"NVIDIA corporation. 2021 a. CUDA C  NVIDIA corporation. 2021 a. CUDA C"},{"key":"e_1_3_2_1_25_1","unstructured":"Programming Guide . NVIDIA corporation. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html  Programming Guide . NVIDIA corporation. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html"},{"key":"e_1_3_2_1_26_1","unstructured":"NVIDIA corporation. 2021 b. Parallel Thread Execution ISA Version 7.3 . NVIDIA corporation. https:\/\/docs.nvidia.com\/cuda\/parallel-thread-execution\/index.html  NVIDIA corporation. 2021 b. Parallel Thread Execution ISA Version 7.3 . NVIDIA corporation. https:\/\/docs.nvidia.com\/cuda\/parallel-thread-execution\/index.html"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-8659.2007.00930.x"},{"volume-title":"Exposing GPU Architecture Characteristics using Microbenchmarking . Master's thesis","author":"van Stigt Rico","key":"e_1_3_2_1_28_1","unstructured":"Rico van Stigt . 2021. Exposing GPU Architecture Characteristics using Microbenchmarking . Master's thesis . University of Amsterdam. Rico van Stigt. 2021. Exposing GPU Architecture Characteristics using Microbenchmarking . Master's thesis. University of Amsterdam."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2010.5452013"}],"event":{"name":"ICPE '22: ACM\/SPEC International Conference on Performance Engineering","sponsor":["SIGMETRICS ACM Special Interest Group on Measurement and Evaluation","SIGSOFT ACM Special Interest Group on Software Engineering"],"location":"Beijing China","acronym":"ICPE '22"},"container-title":["Proceedings of the 2022 ACM\/SPEC on International Conference on Performance Engineering"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3489525.3511673","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3489525.3511673","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:23Z","timestamp":1750186943000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3489525.3511673"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,9]]},"references-count":29,"alternative-id":["10.1145\/3489525.3511673","10.1145\/3489525"],"URL":"https:\/\/doi.org\/10.1145\/3489525.3511673","relation":{},"subject":[],"published":{"date-parts":[[2022,4,9]]},"assertion":[{"value":"2022-04-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}