{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T11:09:18Z","timestamp":1771067358277,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":55,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,6,28]],"date-time":"2022-06-28T00:00:00Z","timestamp":1656374400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,6,28]]},"DOI":"10.1145\/3524059.3532363","type":"proceedings-article","created":{"date-parts":[[2022,6,16]],"date-time":"2022-06-16T16:13:11Z","timestamp":1655395991000},"page":"1-13","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["Efficient, out-of-memory sparse MTTKRP on massively parallel architectures"],"prefix":"10.1145","author":[{"given":"Andy","family":"Nguyen","sequence":"first","affiliation":[{"name":"University of Oregon"}]},{"given":"Ahmed E.","family":"Helal","sequence":"additional","affiliation":[{"name":"Intel Labs"}]},{"given":"Fabio","family":"Checconi","sequence":"additional","affiliation":[{"name":"Intel Labs"}]},{"given":"Jan","family":"Laukemann","sequence":"additional","affiliation":[{"name":"University of Erlangen-N\u00fcrnberg"}]},{"given":"Jesmin Jahan","family":"Tithi","sequence":"additional","affiliation":[{"name":"Intel Labs"}]},{"given":"Yongseok","family":"Soh","sequence":"additional","affiliation":[{"name":"University of Oregon"}]},{"given":"Teresa","family":"Ranadive","sequence":"additional","affiliation":[{"name":"Laboratory for Physical Sciences"}]},{"given":"Fabrizio","family":"Petrini","sequence":"additional","affiliation":[{"name":"Intel Labs"}]},{"given":"Jee W.","family":"Choi","sequence":"additional","affiliation":[{"name":"University of Oregon"}]}],"member":"320","published-online":{"date-parts":[[2022,6,28]]},"reference":[{"key":"e_1_3_2_1_2_1","volume-title":"https:\/\/software.intel.com\/en-us\/get-started-with-intel-dpcpp-compatibility-tool. Online","author":"Compatibility Tool Intel","year":"2022","unstructured":"2022. Intel DPC++ Compatibility Tool . https:\/\/software.intel.com\/en-us\/get-started-with-intel-dpcpp-compatibility-tool. Online ; accessed 14 May 2022 . 2022. Intel DPC++ Compatibility Tool. https:\/\/software.intel.com\/en-us\/get-started-with-intel-dpcpp-compatibility-tool. Online; accessed 14 May 2022."},{"key":"e_1_3_2_1_3_1","volume-title":"Nsight Compute Command Line Interface. https:\/\/docs.nvidia.com\/nsight-compute\/pdf\/NsightComputeCli.pdf. Online","year":"2022","unstructured":"2022. Nsight Compute Command Line Interface. https:\/\/docs.nvidia.com\/nsight-compute\/pdf\/NsightComputeCli.pdf. Online ; accessed 14 May 2022 . 2022. Nsight Compute Command Line Interface. https:\/\/docs.nvidia.com\/nsight-compute\/pdf\/NsightComputeCli.pdf. Online; accessed 14 May 2022."},{"key":"e_1_3_2_1_4_1","volume-title":"Nsight Systems User Guide. https:\/\/docs.nvidia.com\/nsight-systems\/pdf\/UserGuide.pdf. Online","year":"2022","unstructured":"2022. Nsight Systems User Guide. https:\/\/docs.nvidia.com\/nsight-systems\/pdf\/UserGuide.pdf. Online ; accessed 14 May 2022 . 2022. Nsight Systems User Guide. https:\/\/docs.nvidia.com\/nsight-systems\/pdf\/UserGuide.pdf. Online; accessed 14 May 2022."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1137\/060676489"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654078"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3225058.3225133"},{"key":"e_1_3_2_1_8_1","volume-title":"The Xe GPU Architecture. In 2020 IEEE Hot Chips 32 Symposium (HCS). IEEE Computer Society, 1--27","author":"Blythe David","year":"2020","unstructured":"David Blythe . 2020 . The Xe GPU Architecture. In 2020 IEEE Hot Chips 32 Symposium (HCS). IEEE Computer Society, 1--27 . David Blythe. 2020. The Xe GPU Architecture. In 2020 IEEE Hot Chips 32 Symposium (HCS). IEEE Computer Society, 1--27."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CYBERSEC.2016.017"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2018.00066"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/2968826.2968971"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447818.3460692"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2016.01.027"},{"key":"e_1_3_2_1_14_1","volume-title":"Discovery Science, Petra Kralj Novak, Tomislav \u0160muc, and Sa\u0161o D\u017eeroski (Eds.)","author":"Fernandes Sofia","unstructured":"Sofia Fernandes , Hadi Fanaee-T, and Jo\u00e3o Gama . 2019. Evolving Social Networks Analysis via Tensor Decompositions: From Global Event Detection Towards Local Pattern Discovery and Specification . In Discovery Science, Petra Kralj Novak, Tomislav \u0160muc, and Sa\u0161o D\u017eeroski (Eds.) . Springer International Publishing , Cham , 385--395. Sofia Fernandes, Hadi Fanaee-T, and Jo\u00e3o Gama. 2019. Evolving Social Networks Analysis via Tensor Decompositions: From Global Event Detection Towards Local Pattern Discovery and Specification. In Discovery Science, Petra Kralj Novak, Tomislav \u0160muc, and Sa\u0161o D\u017eeroski (Eds.). Springer International Publishing, Cham, 385--395."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1137\/17M1115873"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313548"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447818.3461703"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623658"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2015.7113355"},{"key":"e_1_3_2_1_20_1","volume-title":"Dissecting the NVIDIA volta GPU architecture via microbenchmarking. arXiv preprint arXiv:1804.06826","author":"Jia Zhe","year":"2018","unstructured":"Zhe Jia , Marco Maggioni , Benjamin Staiger , and Daniele P Scarpazza . 2018. Dissecting the NVIDIA volta GPU architecture via microbenchmarking. arXiv preprint arXiv:1804.06826 ( 2018 ). Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele P Scarpazza. 2018. Dissecting the NVIDIA volta GPU architecture via microbenchmarking. arXiv preprint arXiv:1804.06826 (2018)."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2339530.2339583"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807624"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"crossref","unstructured":"David B Kirk. 2006. NVIDIA CUDA Software and GPU Parallel Computing Architecture. (2006).  David B Kirk. 2006. NVIDIA CUDA Software and GPU Parallel Computing Architecture. (2006).","DOI":"10.1145\/1296907.1296909"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133901"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1137\/07070111X"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2008.89"},{"key":"e_1_3_2_1_27_1","unstructured":"Jiajia Li Yuchen Ma and Richard Vuduc. 2018. ParTI! : A Parallel Tensor Infrastructure for Multicore CPUs and GPUs. http:\/\/parti-project.org Last updated: Jan 2020.  Jiajia Li Yuchen Ma and Richard Vuduc. 2018. ParTI! : A Parallel Tensor Infrastructure for Multicore CPUs and GPUs. http:\/\/parti-project.org Last updated: Jan 2020."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00022"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3330345.3330366"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2017.75"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2017.75"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1155\/2016\/8301709"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2549523"},{"key":"e_1_3_2_1_34_1","volume-title":"A computer Oriented Geodetic Data Base","author":"Morton Guy M","year":"1966","unstructured":"Guy M Morton . 1966. A computer Oriented Geodetic Data Base ; and a New Technique in File Sequencing. Technical Report. IBM Ltd ., 150 Laurier Ave., Ottawa, Ontario, Canada. https:\/\/dominoweb.draco.res.ibm.com\/reports\/Morton 1966 .pdf Guy M Morton. 1966. A computer Oriented Geodetic Data Base; and a New Technique in File Sequencing. Technical Report. IBM Ltd., 150 Laurier Ave., Ottawa, Ontario, Canada. https:\/\/dominoweb.draco.res.ibm.com\/reports\/Morton1966.pdf"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356216"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356216"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2019.00023"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2019.00023"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2915921"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1137\/18M1210691"},{"key":"e_1_3_2_1_41_1","volume-title":"Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL","author":"Reinders James","unstructured":"James Reinders , Ben Ashbaugh , James Brodman , Michael Kinsner , John Pennycook , and Xinmin Tian . 2021. Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL . Springer Nature . James Reinders, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, and Xinmin Tian. 2021. Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL. Springer Nature."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13278-012-0069-5"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3428226"},{"key":"e_1_3_2_1_44_1","volume-title":"Tech. Rep. NVR-2008-003 1, 1","author":"Sengupta Shubhabrata","year":"2008","unstructured":"Shubhabrata Sengupta , Mark Harris , Michael Garland , 2008 . Efficient Parallel Scan Algorithms for GPUs. NVIDIA, Santa Clara, CA , Tech. Rep. NVR-2008-003 1, 1 (2008), 1--17. Shubhabrata Sengupta, Mark Harris, Michael Garland, et al. 2008. Efficient Parallel Scan Algorithms for GPUs. NVIDIA, Santa Clara, CA, Tech. Rep. NVR-2008-003 1, 1 (2008), 1--17."},{"key":"e_1_3_2_1_45_1","volume-title":"Distributed Methods for High-Dimensional and Large-Scale Tensor Factorization. In 2014 IEEE International Conference on Data Mining. IEEE, 989--994","author":"Shin Kijung","year":"2014","unstructured":"Kijung Shin and U Kang . 2014 . Distributed Methods for High-Dimensional and Large-Scale Tensor Factorization. In 2014 IEEE International Conference on Data Mining. IEEE, 989--994 . Kijung Shin and U Kang. 2014. Distributed Methods for High-Dimensional and Large-Scale Tensor Factorization. In 2014 IEEE International Conference on Data Mining. IEEE, 989--994."},{"key":"e_1_3_2_1_46_1","volume-title":"FROSTT: The Formidable Repository of Open Sparse Tensors and Tools","author":"Smith Shaden","year":"2017","unstructured":"Shaden Smith , Jee W. Choi , Jiajia Li , Richard Vuduc , Jongsoo Park , Xing Liu , and George Karypis . 2017 . FROSTT: The Formidable Repository of Open Sparse Tensors and Tools . http:\/\/frostt.io\/ Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017. FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. http:\/\/frostt.io\/"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2833179.2833183"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2016.113"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2015.27"},{"key":"e_1_3_2_1_50_1","volume-title":"Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing","author":"Solomonik Edgar","unstructured":"Edgar Solomonik , Devin Matthews , Jeff Hammond , and James Demmel . 2013. Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing . IEEE , 813--824. Edgar Solomonik, Devin Matthews, Jeff Hammond, and James Demmel. 2013. Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. IEEE, 813--824."},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.5555\/3433701.3433724"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3330345.3330354"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/2555243.2555255"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442539"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/1693453.1693472"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178487.3178495"}],"event":{"name":"ICS '22: 2022 International Conference on Supercomputing","location":"Virtual Event","acronym":"ICS '22","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture"]},"container-title":["Proceedings of the 36th ACM International Conference on Supercomputing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524059.3532363","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3524059.3532363","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:37Z","timestamp":1750188637000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524059.3532363"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,28]]},"references-count":55,"alternative-id":["10.1145\/3524059.3532363","10.1145\/3524059"],"URL":"https:\/\/doi.org\/10.1145\/3524059.3532363","relation":{},"subject":[],"published":{"date-parts":[[2022,6,28]]},"assertion":[{"value":"2022-06-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}