{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:27:41Z","timestamp":1750220861026,"version":"3.41.0"},"reference-count":79,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2019,10,11]],"date-time":"2019-10-11T00:00:00Z","timestamp":1570752000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Honeywell International, Inc."},{"name":"US Department of Energy's National Nuclear Security Administration","award":["DE-NA-0003525"],"award-info":[{"award-number":["DE-NA-0003525"]}]},{"name":"National Technology and Engineering Solutions of Sandia, LLC"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2019,12,31]]},"abstract":"<jats:p>Reduction is an operation performed on the values of two or more key-value pairs that share the same key. Reduction of sparse data streams finds application in a wide variety of domains such as data and graph analytics, cybersecurity, machine learning, and HPC applications. However, these applications exhibit low locality of reference, rendering traditional architectures and data representations inefficient. This article presents MetaStrider, a significant algorithmic and architectural enhancement to the state-of-the-art, SuperStrider. Furthermore, these enhancements enable a variety of parallel, memory-centric architectures that we propose, resulting in demonstrated performance that scales near-linearly with available memory-level parallelism.<\/jats:p>","DOI":"10.1145\/3355396","type":"journal-article","created":{"date-parts":[[2019,10,11]],"date-time":"2019-10-11T14:53:33Z","timestamp":1570805613000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["MetaStrider"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9203-8011","authenticated-orcid":false,"given":"Sriseshan","family":"Srikanth","sequence":"first","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA"}]},{"given":"Anirudh","family":"Jain","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA"}]},{"given":"Joseph M.","family":"Lennon","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA"}]},{"given":"Thomas M.","family":"Conte","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA"}]},{"given":"Erik","family":"Debenedictis","sequence":"additional","affiliation":[{"name":"Sandia National Laboratories, Albuquerque, NM"}]},{"given":"Jeanine","family":"Cook","sequence":"additional","affiliation":[{"name":"Sandia National Laboratories, Albuquerque, NM"}]}],"member":"320","published-online":{"date-parts":[[2019,10,11]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Standard map reference implementation. [n.d.]. GCC std::map. Retrieved from https:\/\/gcc.gnu.org\/onlinedocs\/libstdc\/libstdc-html-USERS-3.4\/stl__map_8h-source.html.  Standard map reference implementation. [n.d.]. GCC std::map. Retrieved from https:\/\/gcc.gnu.org\/onlinedocs\/libstdc\/libstdc-html-USERS-3.4\/stl__map_8h-source.html."},{"key":"e_1_2_1_2_1","unstructured":"JEDEC High Bandwidth Memory Specification. [n.d.]. High bandwidth memory (HBM) DRAM. Retrieved from https:\/\/www.jedec.org\/standards-documents\/results\/HBM.  JEDEC High Bandwidth Memory Specification. [n.d.]. High bandwidth memory (HBM) DRAM. Retrieved from https:\/\/www.jedec.org\/standards-documents\/results\/HBM."},{"key":"e_1_2_1_3_1","unstructured":"Matrix Market. [n.d.]. Retrieved from https:\/\/math.nist.gov\/MatrixMarket\/.  Matrix Market. [n.d.]. Retrieved from https:\/\/math.nist.gov\/MatrixMarket\/."},{"key":"e_1_2_1_4_1","unstructured":"High Bandwidth Memory Characterization 1. [n.d.]. Retrieved from https:\/\/www.amd.com\/Documents\/High-Bandwidth-Memory-HBM.pdf.  High Bandwidth Memory Characterization 1. [n.d.]. Retrieved from https:\/\/www.amd.com\/Documents\/High-Bandwidth-Memory-HBM.pdf."},{"key":"e_1_2_1_5_1","unstructured":"High Bandwidth Memory Characterization 2. [n.d.]. Retrieved from https:\/\/www.anandtech.com\/show\/9969\/jedec-publishes-hbm2-specification.  High Bandwidth Memory Characterization 2. [n.d.]. Retrieved from https:\/\/www.anandtech.com\/show\/9969\/jedec-publishes-hbm2-specification."},{"key":"e_1_2_1_6_1","unstructured":"High Bandwidth Memory Characterization 3. [n.d.]. Retrieved from https:\/\/www.gamersnexus.net\/news-pc\/2972-amd-vega-frontier-edition-tear-down-die-size-and-more.  High Bandwidth Memory Characterization 3. [n.d.]. Retrieved from https:\/\/www.gamersnexus.net\/news-pc\/2972-amd-vega-frontier-edition-tear-down-die-size-and-more."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2656893"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3005348"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2015.75"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3085572"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1137\/110838844"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"volume-title":"Uppaluri Siva Ramachandra Murty et al","year":"1976","author":"Bondy John Adrian","key":"e_1_2_1_14_1"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2014.03.012"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-7552(98)00110-X"},{"volume-title":"Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS\u201908)","author":"Buluc Aydin","key":"e_1_2_1_17_1"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342011403516"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/1958016.1958031"},{"volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201917)","year":"2017","author":"Chatterjee N.","key":"e_1_2_1_20_1"},{"volume-title":"Proceedings of the International Workshop on OpenMP. Springer, 189--201","author":"Ciesko Jan","key":"e_1_2_1_21_1"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2699470"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049670"},{"volume-title":"Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC\u201917)","author":"DeBenedictis Erik P.","key":"e_1_2_1_24_1"},{"key":"e_1_2_1_25_1","first-page":"8","article-title":"Extending Moore\u2019s law via computationally error-tolerant computing","volume":"15","author":"Deng Bobin","year":"2018","journal-title":"ACM Trans. Architect. Code Optim."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRC.2016.7738714"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2017.8"},{"volume-title":"Albert Maurice Erisman, and John Ker Reid","year":"2017","author":"Duff Iain S.","key":"e_1_2_1_28_1"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/567806.567810"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2017.3211117"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2015.7322472"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1457838.1457864"},{"volume-title":"Proceedings of the USSR Academy of Sciences.","year":"1962","author":"Adelson-Velsky G. Georgy","key":"e_1_2_1_33_1"},{"volume-title":"Shah","year":"2006","author":"Gilbert John R.","key":"e_1_2_1_34_1"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1137\/130948811"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/355791.355796"},{"volume-title":"Dally","year":"2015","author":"Han Song","key":"e_1_2_1_37_1"},{"volume-title":"Advances in Neural Information Processing Systems","author":"Han Song","key":"e_1_2_1_38_1"},{"volume-title":"Proceedings of the International Workshop on Applied Parallel Computing. Springer, 192--205","year":"2012","author":"Hapla V\u00e1clav","key":"e_1_2_1_39_1"},{"key":"e_1_2_1_40_1","unstructured":"MKL Intel. 2007. Intel math kernel library. Retrieved from https:\/\/software.intel.com\/en-us\/mkl.  MKL Intel. 2007. Intel math kernel library. Retrieved from https:\/\/software.intel.com\/en-us\/mkl."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1016\/0010-4655(95)00031-A"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRC.2018.8638619"},{"volume-title":"Proceedings of the ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA\u201918)","author":"Jun Sang-Woo","key":"e_1_2_1_43_1"},{"key":"e_1_2_1_44_1","doi-asserted-by":"crossref","unstructured":"Jeremy Kepner and John Gilbert. 2011. Graph Algorithms in the Language of Linear Algebra. SIAM.  Jeremy Kepner and John Gilbert. 2011. Graph Algorithms in the Language of Linear Algebra. SIAM.","DOI":"10.1137\/1.9780898719918"},{"volume-title":"Proceedings of the 7th Workshop on Irregular Applications: Architectures and Algorithms. ACM, 6.","author":"Peter","key":"e_1_2_1_45_1"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.76.3168"},{"volume-title":"Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201917)","year":"2017","author":"Kwon H.","key":"e_1_2_1_47_1"},{"volume-title":"Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI'12)","year":"2012","author":"Kyrola Aapo","key":"e_1_2_1_48_1"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1002\/cta.796"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.47"},{"volume-title":"Dally","year":"2018","author":"Liu Xingyu","key":"e_1_2_1_51_1"},{"key":"e_1_2_1_52_1","unstructured":"V. Manikandan V. P. Muralikrishna J. Ajayan and V. S. Mohammed Riyas Deen. [n.d.]. Static carry skip adder designed using 22-nm strained silicon CMOS technology operating under wide range of temperatures. Int. J. Eng. Tech. Res. 8 2 ([n.\u00a0d.]).  V. Manikandan V. P. Muralikrishna J. Ajayan and V. S. Mohammed Riyas Deen. [n.d.]. Static carry skip adder designed using 22-nm strained silicon CMOS technology operating under wide range of temperatures. Int. J. Eng. Tech. Res. 8 2 ([n.\u00a0d.])."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Mao Huizi","key":"e_1_2_1_53_1"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2012.6507483"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASPDAC.2017.7858395"},{"key":"e_1_2_1_56_1","first-page":"19","article-title":"Research problems and opportunities in memory systems","volume":"1","author":"Mutlu Onur","year":"2015","journal-title":"Supercomput. Front. Innovat."},{"key":"e_1_2_1_57_1","doi-asserted-by":"crossref","unstructured":"Yusuke Nagasaka Satoshi Matsuoka Ariful Azad and Ayd\u0131n Bulu\u00e7. 2018. High-performance sparse matrix-matrix products on Intel KNL and multicore architectures. arXiv preprint arXiv:1804.01698.  Yusuke Nagasaka Satoshi Matsuoka Ariful Azad and Ayd\u0131n Bulu\u00e7. 2018. High-performance sparse matrix-matrix products on Intel KNL and multicore architectures. arXiv preprint arXiv:1804.01698.","DOI":"10.1145\/3229710.3229720"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/CASES.2015.7324551"},{"volume-title":"Cusparse library","author":"CUDA NVIDIA.","key":"e_1_2_1_59_1"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00067"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-20119-1_4"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1016\/0196-6774(89)90005-9"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01389000"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/115952.115961"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1137\/060650271"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522740"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2014.2341926"},{"volume-title":"Proceedings of the International Workshop on GPUs and Scientific Applications. 51--56","year":"2010","author":"Rupp Karl","key":"e_1_2_1_68_1"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2017.8091096"},{"volume-title":"Proceedings of the International Conference on Parallel Processing and Applied Mathematics. Springer, 559--570","year":"2013","author":"Saule Erik","key":"e_1_2_1_70_1"},{"volume-title":"An Interactive System for Combinatorial Scientific Computing with an Emphasis on Programmer Productivity","author":"Shah Viral B.","key":"e_1_2_1_71_1"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRC.2017.8123669"},{"volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201918)","author":"Srikanth Sriseshan","key":"e_1_2_1_73_1"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240302.3240314"},{"volume-title":"Proceedings of the 1st Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing (IPPS\/SPDP\u201998)","author":"Peter","key":"e_1_2_1_75_1"},{"volume-title":"Demmel","year":"2003","author":"Vuduc Richard Wilson","key":"e_1_2_1_76_1"},{"volume-title":"Advances in Neural Information Processing Systems","year":"2074","author":"Wen Wei","key":"e_1_2_1_77_1"},{"volume-title":"Proceedings of the International Conference on High Performance Computing for Computational Science. Springer, 421--434","year":"2010","author":"Yamazaki Ichitaro","key":"e_1_2_1_78_1"},{"volume-title":"Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 254--260","year":"2004","author":"Yuster Raphael","key":"e_1_2_1_79_1"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/360128.360134"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3355396","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3355396","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:23:34Z","timestamp":1750202614000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3355396"}},"subtitle":["Architectures for Scalable Memory-centric Reduction of Sparse Data Streams"],"short-title":[],"issued":{"date-parts":[[2019,10,11]]},"references-count":79,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,12,31]]}},"alternative-id":["10.1145\/3355396"],"URL":"https:\/\/doi.org\/10.1145\/3355396","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2019,10,11]]},"assertion":[{"value":"2019-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-10-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}