{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,27]],"date-time":"2026-05-27T22:15:24Z","timestamp":1779920124225,"version":"3.53.1"},"publisher-location":"New York, NY, USA","reference-count":55,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,6,21]],"date-time":"2023-06-21T00:00:00Z","timestamp":1687305600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000015","name":"U.S. Department of Energy","doi-asserted-by":"publisher","award":["DE-SC0022209"],"award-info":[{"award-number":["DE-SC0022209"]}],"id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,6,21]]},"DOI":"10.1145\/3577193.3593715","type":"proceedings-article","created":{"date-parts":[[2023,6,20]],"date-time":"2023-06-20T18:47:05Z","timestamp":1687286825000},"page":"360-372","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8637-3307","authenticated-orcid":false,"given":"Shixun","family":"Wu","sequence":"first","affiliation":[{"name":"University of California, Riverside, Riverside, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2688-8058","authenticated-orcid":false,"given":"Yujia","family":"Zhai","sequence":"additional","affiliation":[{"name":"University of California, Riverside, Riverside, California, United States of America"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0177-502X","authenticated-orcid":false,"given":"Jinyang","family":"Liu","sequence":"additional","affiliation":[{"name":"University of California, Riverside, Riverside, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5092-3987","authenticated-orcid":false,"given":"Jiajun","family":"Huang","sequence":"additional","affiliation":[{"name":"University of California, Riverside, Riverside, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-2079-8130","authenticated-orcid":false,"given":"Zizhe","family":"Jian","sequence":"additional","affiliation":[{"name":"University of California, Riverside, Riverside, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3477-8043","authenticated-orcid":false,"given":"Bryan","family":"Wong","sequence":"additional","affiliation":[{"name":"University of California, Riverside, Riverside, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2578-4940","authenticated-orcid":false,"given":"Zizhong","family":"Chen","sequence":"additional","affiliation":[{"name":"University of California, Riverside, Riverside, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,6,21]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dandelion Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https:\/\/www.tensorflow.org\/ Software available from tensorflow.org.  Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dandelion Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https:\/\/www.tensorflow.org\/ Software available from tensorflow.org."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/156137.2813183"},{"key":"e_1_3_2_1_3_1","volume-title":"Soft errors in commercial semiconductor technology: Overview and scaling trends","author":"Baumann Robert","year":"2002","unstructured":"Robert Baumann . 2002. Soft errors in commercial semiconductor technology: Overview and scaling trends . IEEE 2002 Reliability Physics Tutorial Notes, Reliability Fundamentals 7 (2002). Robert Baumann. 2002. Soft errors in commercial semiconductor technology: Overview and scaling trends. IEEE 2002 Reliability Physics Tutorial Notes, Reliability Fundamentals 7 (2002)."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078597.3078617"},{"key":"e_1_3_2_1_5_1","volume-title":"A High-Performance Implementation of Atomistic Spin Dynamics Simulations on x86 CPUs. arXiv preprint arXiv:2304.10966","author":"Chen Hongwei","year":"2023","unstructured":"Hongwei Chen , Yujia Zhai , Joshua J Turner , and Adrian Feiguin . 2023. A High-Performance Implementation of Atomistic Spin Dynamics Simulations on x86 CPUs. arXiv preprint arXiv:2304.10966 ( 2023 ). Hongwei Chen, Yujia Zhai, Joshua J Turner, and Adrian Feiguin. 2023. A High-Performance Implementation of Atomistic Spin Dynamics Simulations on x86 CPUs. arXiv preprint arXiv:2304.10966 (2023)."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/PADSW.2014.7097827"},{"key":"e_1_3_2_1_7_1","volume-title":"TVM: An automated end-to-end optimizing compiler for deep learning. arXiv preprint arXiv:1802.04799","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen , Thierry Moreau , Ziheng Jiang , Lianmin Zheng , Eddie Yan , Meghan Cowan , Haichen Shen , Leyuan Wang , Yuwei Hu , Luis Ceze , 2018 . TVM: An automated end-to-end optimizing compiler for deep learning. arXiv preprint arXiv:1802.04799 (2018). Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. arXiv preprint arXiv:1802.04799 (2018)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2008.4536158"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442533"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2008.58"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/HASE.2008.13"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2903150.2903170"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.53"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2015.05.187"},{"key":"e_1_3_2_1_15_1","volume-title":"FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. arXiv preprint arXiv:2205.14135","author":"Dao Tri","year":"2022","unstructured":"Tri Dao , Daniel Y Fu , Stefano Ermon , Atri Rudra , and Christopher R\u00e9. 2022. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. arXiv preprint arXiv:2205.14135 ( 2022 ). Tri Dao, Daniel Y Fu, Stefano Ermon, Atri Rudra, and Christopher R\u00e9. 2022. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. arXiv preprint arXiv:2205.14135 (2022)."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2517639"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPA.2011.50"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342010391989"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSPEC.2016.7420396"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2015.108"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2001.941390"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2014.2320502"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372419"},{"key":"e_1_3_2_1_24_1","volume-title":"Algorithm-based fault tolerance for matrix operations","author":"Huang Kuang-Hua","year":"1984","unstructured":"Kuang-Hua Huang and Jacob A Abraham . 1984. Algorithm-based fault tolerance for matrix operations . IEEE transactions on computers 100, 6 ( 1984 ), 518--528. Kuang-Hua Huang and Jacob A Abraham. 1984. Algorithm-based fault tolerance for matrix operations. IEEE transactions on computers 100, 6 (1984), 518--528."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476184"},{"key":"e_1_3_2_1_27_1","volume-title":"Dependable computing and fault-tolerance. Digest of Papers FTCS-15","author":"Laprie Jean-Claude","year":"1985","unstructured":"Jean-Claude Laprie . 1985. Dependable computing and fault-tolerance. Digest of Papers FTCS-15 ( 1985 ), 2--11. Jean-Claude Laprie. 1985. Dependable computing and fault-tolerance. Digest of Papers FTCS-15 (1985), 2--11."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/2388996.2389074"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126964"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356195"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126915"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Robert Lucas James Ang Keren Bergman Shekhar Borkar William Carlson Laura Carrington George Chiu Robert Colwell William Dally Jack Dongarra etal 2014. DOE advanced scientific computing advisory subcommittee (ASCAC) report: top ten exascale research challenges. Technical Report. USDOE Office of Science (SC)(United States).  Robert Lucas James Ang Keren Bergman Shekhar Borkar William Carlson Laura Carrington George Chiu Robert Colwell William Dally Jack Dongarra et al. 2014. DOE advanced scientific computing advisory subcommittee (ASCAC) report: top ten exascale research challenges. Technical Report. USDOE Office of Science (SC)(United States).","DOI":"10.2172\/1222713"},{"key":"e_1_3_2_1_33_1","volume-title":"Analyzing software requirements errors in safety-critical, embedded systems. In [1993] Proceedings of the IEEE International Symposium on Requirements Engineering","author":"Lutz Robyn R","unstructured":"Robyn R Lutz . 1993. Analyzing software requirements errors in safety-critical, embedded systems. In [1993] Proceedings of the IEEE International Symposium on Requirements Engineering . IEEE , 126--133. Robyn R Lutz. 1993. Analyzing software requirements errors in safety-critical, embedded systems. In [1993] Proceedings of the IEEE International Symposium on Requirements Engineering. IEEE, 126--133."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/T-ED.1979.19370"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/VLSI-TSA.2014.6839639"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/VTEST.1999.766651"},{"key":"e_1_3_2_1_37_1","volume-title":"Retrieved","author":"NVIDIA.","year":"2022","unstructured":"NVIDIA. Retrieved in 2022 . https:\/\/github.com\/NVIDIA\/cutlass. Online . NVIDIA. Retrieved in 2022. https:\/\/github.com\/NVIDIA\/cutlass. Online."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/24.994926"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/24.994913"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126960"},{"key":"e_1_3_2_1_41_1","volume-title":"PyTorch: An Imperative Style","author":"Paszke Adam","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , Alban Desmaison , Andreas Kopf , Edward Yang , Zachary DeVito , Martin Raison , Alykhan Tejani , Sasank Chilamkurthy , Benoit Steiner , Lu Fang , Junjie Bai , and Soumith Chintala . 2019. PyTorch: An Imperative Style , High-Performance Deep Learning Library . In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1002\/jcc.20289"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2005.34"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00017"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342014522573"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3208040.3208050"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.207595"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2907294.2907306"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600212.2600232"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jocs.2013.05.002"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2009.14"},{"key":"e_1_3_2_1_52_1","volume-title":"Proceedings of the ACM International Conference on Supercomputing. 127--138","author":"Zhai Yujia","year":"2021","unstructured":"Yujia Zhai , Elisabeth Giem , Quan Fan , Kai Zhao , Jinyang Liu , and Zizhong Chen . 2021 . FT-BLAS: a high performance BLAS implementation with online fault tolerance . In Proceedings of the ACM International Conference on Supercomputing. 127--138 . Yujia Zhai, Elisabeth Giem, Quan Fan, Kai Zhao, Jinyang Liu, and Zizhong Chen. 2021. FT-BLAS: a high performance BLAS implementation with online fault tolerance. In Proceedings of the ACM International Conference on Supercomputing. 127--138."},{"key":"e_1_3_2_1_53_1","volume-title":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 705--716","author":"Zhai Yujia","year":"2022","unstructured":"Yujia Zhai , Mohannad Ibrahim , Yiqin Qiu , Fabian Boemer , Zizhong Chen , Alexey Titov , and Alexander Lyashevsky . 2022 . Accelerating encrypted computing on intel gpus . In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 705--716 . Yujia Zhai, Mohannad Ibrahim, Yiqin Qiu, Fabian Boemer, Zizhong Chen, Alexey Titov, and Alexander Lyashevsky. 2022. Accelerating encrypted computing on intel gpus. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 705--716."},{"key":"e_1_3_2_1_54_1","volume-title":"ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs. arXiv preprint arXiv:2210.03052","author":"Zhai Yujia","year":"2022","unstructured":"Yujia Zhai , Chengquan Jiang , Leyuan Wang , Xiaoying Jia , Shang Zhang , Zizhong Chen , Xin Liu , and Yibo Zhu . 2022. ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs. arXiv preprint arXiv:2210.03052 ( 2022 ). Yujia Zhai, Chengquan Jiang, Leyuan Wang, Xiaoying Jia, Shang Zhang, Zizhong Chen, Xin Liu, and Yibo Zhu. 2022. ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs. arXiv preprint arXiv:2210.03052 (2022)."},{"key":"e_1_3_2_1_55_1","volume-title":"Algorithm-based fault tolerance for convolutional neural networks","author":"Zhao Kai","year":"2020","unstructured":"Kai Zhao , Sheng Di , Sihuan Li , Xin Liang , Yujia Zhai , Jieyang Chen , Kaiming Ouyang , Franck Cappello , and Zizhong Chen . 2020. Algorithm-based fault tolerance for convolutional neural networks . IEEE Transactions on Parallel and Distributed Systems ( 2020 ). Kai Zhao, Sheng Di, Sihuan Li, Xin Liang, Yujia Zhai, Jieyang Chen, Kaiming Ouyang, Franck Cappello, and Zizhong Chen. 2020. Algorithm-based fault tolerance for convolutional neural networks. IEEE Transactions on Parallel and Distributed Systems (2020)."}],"event":{"name":"ICS '23: 37th International Conference on Supercomputing","location":"Orlando FL USA","acronym":"ICS '23","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture"]},"container-title":["Proceedings of the 37th International Conference on Supercomputing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3577193.3593715","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/3577193.3593715","content-type":"text\/html","content-version":"vor","intended-application":"syndication"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:31Z","timestamp":1750178851000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3577193.3593715"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,21]]},"references-count":55,"alternative-id":["10.1145\/3577193.3593715","10.1145\/3577193"],"URL":"https:\/\/doi.org\/10.1145\/3577193.3593715","relation":{},"subject":[],"published":{"date-parts":[[2023,6,21]]},"assertion":[{"value":"2023-06-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}