{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T18:11:00Z","timestamp":1777486260662,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":27,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,9,28]],"date-time":"2020-09-28T00:00:00Z","timestamp":1601251200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100006435","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1710371"],"award-info":[{"award-number":["1710371"]}],"id":[{"id":"10.13039\/100006435","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,9,28]]},"DOI":"10.1145\/3422575.3422794","type":"proceedings-article","created":{"date-parts":[[2021,3,22]],"date-time":"2021-03-22T01:43:40Z","timestamp":1616377420000},"page":"209-222","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Evaluating Gather and Scatter Performance on CPUs and GPUs"],"prefix":"10.1145","author":[{"given":"Patrick","family":"Lavin","sequence":"first","affiliation":[{"name":"Georgia Tech, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jeffrey","family":"Young","sequence":"additional","affiliation":[{"name":"Georgia Tech, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Richard","family":"Vuduc","sequence":"additional","affiliation":[{"name":"Georgia Tech, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jason","family":"Riedy","sequence":"additional","affiliation":[{"name":"Lucata Corporation, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aaron","family":"Vose","sequence":"additional","affiliation":[{"name":"NanoSemi Inc., United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel","family":"Ernst","sequence":"additional","affiliation":[{"name":"Hewlett Packard Enterprise, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,3,21]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2014. CORAL RFP B604142. https:\/\/asc.llnl.gov\/CORAL\/. Accessed: 2019-04-02.  2014. CORAL RFP B604142. https:\/\/asc.llnl.gov\/CORAL\/. Accessed: 2019-04-02."},{"key":"e_1_3_2_1_2_1","unstructured":"2018. CORAL-2 ACQUISITION RFP No. 6400015092. https:\/\/procurement.ornl.gov\/rfp\/CORAL2\/. Accessed: 2019-04-02.  2018. CORAL-2 ACQUISITION RFP No. 6400015092. https:\/\/procurement.ornl.gov\/rfp\/CORAL2\/. Accessed: 2019-04-02."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3357526.3357574"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/1247360.1247401"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063401"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1735688.1735702"},{"key":"e_1_3_2_1_8_1","volume-title":"GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models","author":"Deakin Tom","unstructured":"Tom Deakin , James Price , Matt Martineau , and Simon McIntosh-Smith . 2016. GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models . In High Performance Computing, Michela Taufer, Bernd Mohr, and Julian\u00a0M. Kunkel (Eds.). Springer International Publishing , Cham , 489\u2013507. Tom Deakin, James Price, Matt Martineau, and Simon McIntosh-Smith. 2016. GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models. In High Performance Computing, Michela Taufer, Bernd Mohr, and Julian\u00a0M. Kunkel (Eds.). Springer International Publishing, Cham, 489\u2013507."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/3018843.3018845"},{"key":"e_1_3_2_1_10_1","volume-title":"NEKBONE: Thermal Hydraulics mini-application. Nekbone Release 2(2013).","author":"Fischer P","year":"2013","unstructured":"P Fischer and K Heisey . 2013 . NEKBONE: Thermal Hydraulics mini-application. Nekbone Release 2(2013). P Fischer and K Heisey. 2013. NEKBONE: Thermal Hydraulics mini-application. Nekbone Release 2(2013)."},{"key":"e_1_3_2_1_11_1","volume-title":"Spector: An OpenCL FPGA benchmark suite. (12","author":"Gautier Quentin","year":"2016","unstructured":"Quentin Gautier , Alric Althoff , Pingfan Meng , and Ryan Kastner . 2016 . Spector: An OpenCL FPGA benchmark suite. (12 2016). Quentin Gautier, Alric Althoff, Pingfan Meng, and Ryan Kastner. 2016. Spector: An OpenCL FPGA benchmark suite. (12 2016)."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318170.3318192"},{"key":"e_1_3_2_1_13_1","volume-title":"An Initial Characterization of the Emu Chick. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 579\u2013588","author":"Hein E.","year":"2018","unstructured":"E. Hein , T. Conte , J. Young , S. Eswar , J. Li , P. Lavin , R. Vuduc , and J. Riedy . 2018 . An Initial Characterization of the Emu Chick. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 579\u2013588 . https:\/\/doi.org\/10.1109\/IPDPSW. 2018 .00097 10.1109\/IPDPSW.2018.00097 E. Hein, T. Conte, J. Young, S. Eswar, J. Li, P. Lavin, R. Vuduc, and J. Riedy. 2018. An Initial Characterization of the Emu Chick. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 579\u2013588. https:\/\/doi.org\/10.1109\/IPDPSW.2018.00097"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2010.107"},{"key":"e_1_3_2_1_15_1","volume-title":"Technical Report.","author":"Karlin Ian","unstructured":"Ian Karlin , Jeff Keasler , and JR Neely . 2013. Lulesh 2.0 updates and changes. Technical Report. Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States) . Ian Karlin, Jeff Keasler, and JR Neely. 2013. Lulesh 2.0 updates and changes. Technical Report. Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States)."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024724.2024754"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3225058.3225095"},{"key":"e_1_3_2_1_19_1","unstructured":"John McCalpin. 2018. Notes on \u201cnon-temporal\u201d (aka \u201cstreaming\u201d) stores. http:\/\/sites.utexas.edu\/jdm4372\/tag\/cache\/.  John McCalpin. 2018. Notes on \u201cnon-temporal\u201d (aka \u201cstreaming\u201d) stores. http:\/\/sites.utexas.edu\/jdm4372\/tag\/cache\/."},{"key":"e_1_3_2_1_20_1","volume-title":"Memory Bandwidth and Machine Balance in Current High Performance Computers","author":"McCalpin D.","year":"1995","unstructured":"John\u00a0 D. McCalpin . 1995. Memory Bandwidth and Machine Balance in Current High Performance Computers . IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter ( Dec. 1995 ), 19\u201325. John\u00a0D. McCalpin. 1995. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter (Dec. 1995), 19\u201325."},{"key":"e_1_3_2_1_21_1","unstructured":"Mahesh Rajan Douglas\u00a0W Doerfler Mike Tupek and Simon Hammond. 2015. An investigation of compiler vectorization on current and next-generation Intel processors using benchmarks and Sandia\u2019s Sierra Applications. (2015).  Mahesh Rajan Douglas\u00a0W Doerfler Mike Tupek and Simon Hammond. 2015. An investigation of compiler vectorization on current and next-generation Intel processors using benchmarks and Sandia\u2019s Sierra Applications. (2015)."},{"key":"e_1_3_2_1_22_1","unstructured":"Mark\u00a0K. Seager. 2019. STRIDE CORAL 2 benchmark summary. https:\/\/asc.llnl.gov\/coral-2-benchmarks\/downloads\/STRIDE_Summary_v1.0.pdf.  Mark\u00a0K. Seager. 2019. STRIDE CORAL 2 benchmark summary. https:\/\/asc.llnl.gov\/coral-2-benchmarks\/downloads\/STRIDE_Summary_v1.0.pdf."},{"key":"e_1_3_2_1_23_1","volume-title":"The Superstrider Architecture: Integrating Logic and Memory Towards Non-Von Neumann Computing. In 2017 IEEE International Conference on Rebooting Computing (ICRC). 1\u20138. https:\/\/doi.org\/10","author":"Srikanth S.","year":"2017","unstructured":"S. Srikanth , T.\u00a0 M. Conte , E.\u00a0 P. DeBenedictis , and J. Cook . 2017 . The Superstrider Architecture: Integrating Logic and Memory Towards Non-Von Neumann Computing. In 2017 IEEE International Conference on Rebooting Computing (ICRC). 1\u20138. https:\/\/doi.org\/10 .1109\/ICRC. 2017 .8123669 10.1109\/ICRC.2017.8123669 S. Srikanth, T.\u00a0M. Conte, E.\u00a0P. DeBenedictis, and J. Cook. 2017. The Superstrider Architecture: Integrating Logic and Memory Towards Non-Von Neumann Computing. In 2017 IEEE International Conference on Rebooting Computing (ICRC). 1\u20138. https:\/\/doi.org\/10.1109\/ICRC.2017.8123669"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2017.35"},{"key":"e_1_3_2_1_25_1","volume-title":"Parboil: A revised benchmark suite for scientific and commercial throughput computing","author":"Stratton A","year":"2012","unstructured":"John\u00a0 A Stratton , Christopher Rodrigues , I- Jui Sung , Nady Obeid , Li-Wen Chang , Nasser Anssari , Geng\u00a0Daniel Liu , and Wen-mei\u00a0 W Hwu . 2012 . Parboil: A revised benchmark suite for scientific and commercial throughput computing . Center for Reliable and High-Performance Computing 127 (2012). John\u00a0A Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng\u00a0Daniel Liu, and Wen-mei\u00a0W Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing 127 (2012)."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2005.13"},{"key":"e_1_3_2_1_27_1","volume-title":"Tools for High Performance Computing","author":"Terpstra Dan","year":"2009","unstructured":"Dan Terpstra , Heike Jagode , Haihang You , and Jack Dongarra . 2010. Collecting Performance Data with PAPI-C . In Tools for High Performance Computing 2009 , Matthias\u00a0S. M\u00fcller, Michael\u00a0M. Resch, Alexander Schulz, and Wolfgang\u00a0E. Nagel (Eds.). Springer Berlin Heidelberg , Berlin, Heidelberg, 157\u2013173. Dan Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. 2010. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009, Matthias\u00a0S. M\u00fcller, Michael\u00a0M. Resch, Alexander Schulz, and Wolfgang\u00a0E. Nagel (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 157\u2013173."},{"key":"e_1_3_2_1_28_1","unstructured":"Ulrike Yang Robert Falgout and Jongsoo Park. 2017. Algebraic Multigrid Benchmark Version 00. https:\/\/www.osti.gov\/\/servlets\/purl\/1389816  Ulrike Yang Robert Falgout and Jongsoo Park. 2017. Algebraic Multigrid Benchmark Version 00. https:\/\/www.osti.gov\/\/servlets\/purl\/1389816"}],"event":{"name":"MEMSYS 2020: The International Symposium on Memory Systems","location":"Washington DC USA","acronym":"MEMSYS 2020"},"container-title":["The International Symposium on Memory Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3422575.3422794","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3422575.3422794","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:01:55Z","timestamp":1750197715000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3422575.3422794"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,28]]},"references-count":27,"alternative-id":["10.1145\/3422575.3422794","10.1145\/3422575"],"URL":"https:\/\/doi.org\/10.1145\/3422575.3422794","relation":{},"subject":[],"published":{"date-parts":[[2020,9,28]]},"assertion":[{"value":"2021-03-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}