{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T08:06:30Z","timestamp":1759133190051,"version":"3.41.0"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2016,1,28]],"date-time":"2016-01-28T00:00:00Z","timestamp":1453939200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"U.S. NSF","award":["CCF-1320730, CCF-1351054 (CAREER) and EPS-0903806"],"award-info":[{"award-number":["CCF-1320730, CCF-1351054 (CAREER) and EPS-0903806"]}]},{"name":"State of Kansas through the Kansas Board of Regents"},{"DOI":"10.13039\/501100001809","name":"NSF of China","doi-asserted-by":"crossref","award":["91418203"],"award-info":[{"award-number":["91418203"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2016,1,28]]},"abstract":"<jats:p>The increasing adoption of graphics processing units (GPUs) for high-performance computing raises the reliability challenge, which is generally ignored in traditional GPUs. GPUs usually support thousands of parallel threads and require a sizable register file. Such large register file is highly susceptible to soft errors and power-hungry. Although ECC has been adopted to register file in modern GPUs, it causes considerable power overhead, which further increases the power stress. Thus, an energy-efficient soft-error protection mechanism is more desirable. Besides its extremely low leakage power consumption, resistive memory (e.g., spin-transfer torque RAM) is also immune to the radiation induced soft errors due to its magnetic field based storage. In this article, we propose to LEverage reSistive memory to enhance the Soft-error robustness and reduce the power consumption (LESS) of registers in the General-Purpose computing on GPUs (GPGPUs). Since resistive memory experiences longer write latency compared to SRAM, we explore the unique characteristics of GPGPU applications to obtain the win-win gains: achieving the near-full soft-error protection for the register file, and meanwhile substantially reducing the energy consumption with negligible performance degradation. Our experimental results show that LESS is able to mitigate the registers soft-error vulnerability by 86% and achieve 61% energy savings with negligible (e.g., 1%) performance degradation.<\/jats:p>","DOI":"10.1145\/2827697","type":"journal-article","created":{"date-parts":[[2016,2,1]],"date-time":"2016-02-01T20:37:54Z","timestamp":1454359074000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Exploring Soft-Error Robust and Energy-Efficient Register File in GPGPUs using Resistive Memory"],"prefix":"10.1145","volume":"21","author":[{"given":"Jingweijia","family":"Tan","sequence":"first","affiliation":[{"name":"University of Houston, Houston, TX"}]},{"given":"Zhi","family":"Li","sequence":"additional","affiliation":[{"name":"University of Kansas"}]},{"given":"Mingsong","family":"Chen","sequence":"additional","affiliation":[{"name":"East China Normal University"}]},{"given":"Xin","family":"Fu","sequence":"additional","affiliation":[{"name":"University of Houston"}]}],"member":"320","published-online":{"date-parts":[[2016,1,28]]},"reference":[{"volume-title":"Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software. 163--174","author":"Bakhoda A.","key":"e_1_2_1_1_1","unstructured":"A. Bakhoda , G. L. Yuan , W. W. L. Fung , H. Wong , and T. M. Aamodt . 2009. Analyzing CUDA workloads using a detailed GPU simulator . In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software. 163--174 . A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software. 163--174."},{"volume-title":"Proceedings of the 5th International Symposium on High-Performance Computer Architecture. 13--22","author":"Brooks D.","key":"e_1_2_1_2_1","unstructured":"D. Brooks and M. Martonosi . 1999. Dynamically exploiting narrow width operands to improve processor power and performance . In Proceedings of the 5th International Symposium on High-Performance Computer Architecture. 13--22 . D. Brooks and M. Martonosi. 1999. Dynamically exploiting narrow width operands to improve processor power and performance. In Proceedings of the 5th International Symposium on High-Performance Computer Architecture. 13--22."},{"volume-title":"Proceedings of the 39th Annual International Symposium on Computer Architecture. 49--60","author":"Brunie N.","key":"e_1_2_1_3_1","unstructured":"N. Brunie , S. Collange , and G. Diamos . 2012. Simultaneous branch and warp interweaving for sustained GPU performance . In Proceedings of the 39th Annual International Symposium on Computer Architecture. 49--60 . N. Brunie, S. Collange, and G. Diamos. 2012. Simultaneous branch and warp interweaving for sustained GPU performance. In Proceedings of the 39th Annual International Symposium on Computer Architecture. 49--60."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2013.6522314"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306797"},{"volume-title":"Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 225--234","author":"Chen Z.","key":"e_1_2_1_6_1","unstructured":"Z. Chen , D. Kaeli , and N. Rubin . 2013. Characterizing scalar opportunities in GPGPU applications . In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 225--234 . Z. Chen, D. Kaeli, and N. Rubin. 2013. Characterizing scalar opportunities in GPGPU applications. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 225--234."},{"volume-title":"Proceedings of the International Conference on Parallel Processing. 46--55","author":"Collange S.","key":"e_1_2_1_7_1","unstructured":"S. Collange , D. Defour , and Y. Zhang . 2009. Dynamic detection of uniform and affine vectors in GPGPU computations . In Proceedings of the International Conference on Parallel Processing. 46--55 . S. Collange, D. Defour, and Y. Zhang. 2009. Dynamic detection of uniform and affine vectors in GPGPU computations. In Proceedings of the International Conference on Parallel Processing. 46--55."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2012.2185930"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/SiPS.2012.11"},{"volume-title":"Proceedings of SELSE'12","author":"Farazmand N.","key":"e_1_2_1_10_1","unstructured":"N. Farazmand , R. Ubal , and D. Kaeli . 2012. Statistical fault injection-based AVF analysis of a GPU architecture . In Proceedings of SELSE'12 . N. Farazmand, R. Ubal, and D. Kaeli.2012. Statistical fault injection-based AVF analysis of a GPU architecture. In Proceedings of SELSE'12."},{"volume-title":"Document Number: Brmramslscltrl-Freescale MRAM Technology.","year":"2007","key":"e_1_2_1_11_1","unstructured":"Freescale. 2007 . Document Number: Brmramslscltrl-Freescale MRAM Technology. Freescale. 2007. Document Number: Brmramslscltrl-Freescale MRAM Technology."},{"volume-title":"Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture. 25--36","author":"Fung W. W. L.","key":"e_1_2_1_12_1","unstructured":"W. W. L. Fung and T. M. Aamodt . 2011. Thread block compaction for efficient SIMT control flow . In Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture. 25--36 . W. W. L. Fung and T. M. Aamodt. 2011. Thread block compaction for efficient SIMT control flow. In Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture. 25--36."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000064.2000093"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155675"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2013.6522330"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2013.6522331"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1816012"},{"key":"e_1_2_1_18_1","unstructured":"Intel. 2015. http:\/\/ark.intel.com\/products\/88040\/Intel-Core-i7-5775C-Processor-6M-Cache-up-to-3_70-GHz.  Intel. 2015. http:\/\/ark.intel.com\/products\/88040\/Intel-Core-i7-5775C-Processor-6M-Cache-up-to-3_70-GHz."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485952"},{"key":"e_1_2_1_20_1","unstructured":"D. Kanter. 2010. AMD's Cayman GPU architecture. http:\/\/www.realworldtech.com\/cayman\/.  D. Kanter. 2010. AMD's Cayman GPU architecture. http:\/\/www.realworldtech.com\/cayman\/."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485964"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669172"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2007.99"},{"volume-title":"Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture. 29","author":"Mukherjee S. S.","key":"e_1_2_1_24_1","unstructured":"S. S. Mukherjee , C. Weaver , J. Emer , S. K. Reinhardt , and T. Austin . 2003. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor . In Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture. 29 . S. S. Mukherjee, C. Weaver, J. Emer, S. K. Reinhardt, and T. Austin. 2003. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture. 29."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155656"},{"key":"e_1_2_1_26_1","unstructured":"NVIDIA. 2009. NVIDIA's Next Generation CUDA Compute Architecture: Fermi. http:\/\/www.nvidia.com\/content\/PDF\/fermi_white_papers\/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf.  NVIDIA. 2009. NVIDIA's Next Generation CUDA Compute Architecture: Fermi. http:\/\/www.nvidia.com\/content\/PDF\/fermi_white_papers\/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf."},{"volume-title":"CUDA Programming Guide Version 3.0","author":"IA.","key":"e_1_2_1_27_1","unstructured":"NVID IA. 2010. CUDA Programming Guide Version 3.0 ., Nvidia Corporation . NVIDIA. 2010. CUDA Programming Guide Version 3.0., Nvidia Corporation."},{"key":"e_1_2_1_28_1","unstructured":"NVIDIA. 2012. NVIDIA's Next Generation CUDA Compute Architecture: Kepler GK110 http:\/\/www.nvidia.com\/content\/PDF\/kepler\/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf.  NVIDIA. 2012. NVIDIA's Next Generation CUDA Compute Architecture: Kepler GK110 http:\/\/www.nvidia.com\/content\/PDF\/kepler\/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf."},{"key":"e_1_2_1_29_1","unstructured":"NVIDIA. 2014. http:\/\/www.nvidia.com\/object\/cuda_sdks.html.  NVIDIA. 2014. http:\/\/www.nvidia.com\/object\/cuda_sdks.html."},{"volume-title":"Proceedings of IEEE 20th International Symposium on High Performance Computer Architecture. 49--59","author":"Palframan D. J.","key":"e_1_2_1_30_1","unstructured":"D. J. Palframan , N. S. Kim , and M. H. Lipasti . 2014. Precision-aware soft error protection for GPUs . In Proceedings of IEEE 20th International Symposium on High Performance Computer Architecture. 49--59 . D. J. Palframan, N. S. Kim, and M. H. Lipasti. 2014. Precision-aware soft error protection for GPUs. In Proceedings of IEEE 20th International Symposium on High Performance Computer Architecture. 49--59."},{"key":"e_1_2_1_31_1","unstructured":"Parboil. 2012. Parboil Benchmark Suite. URL: http:\/\/impact.crhc.illinois.edu\/parboil.php.  Parboil. 2012. Parboil Benchmark Suite. URL: http:\/\/impact.crhc.illinois.edu\/parboil.php."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2013.6522352"},{"volume-title":"STT-RAM for shared memory in GPUs. Master's Thesis","author":"Satyamoorthy P.","key":"e_1_2_1_33_1","unstructured":"P. Satyamoorthy . 2011. STT-RAM for shared memory in GPUs. Master's Thesis . University of Virginia . P. Satyamoorthy. 2011. STT-RAM for shared memory in GPUs. Master's Thesis. University of Virginia."},{"volume-title":"Proceedings of IEEE 17th International Symposium on High Performance Computer Architecture. 50--61","author":"Smullen C. W.","key":"e_1_2_1_35_1","unstructured":"C. W. Smullen , V. Mohan , A. Nigam , S. Gurumurthi , and M. R. Stan . 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches . In Proceedings of IEEE 17th International Symposium on High Performance Computer Architecture. 50--61 . C. W. Smullen, V. Mohan, A. Nigam, S. Gurumurthi, and M. R. Stan. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of IEEE 17th International Symposium on High Performance Computer Architecture. 50--61."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2011.6081425"},{"volume-title":"Proceedings of IEEE 15th International Symposium on High Performance Computer Architecture. 239--249","author":"Sun G.","key":"e_1_2_1_37_1","unstructured":"G. Sun , X. Dong , Y. Xie , J. Li , and Y. Chen . 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs . In Proceedings of IEEE 15th International Symposium on High Performance Computer Architecture. 239--249 . G. Sun, X. Dong, Y. Xie, J. Li, and Y. Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of IEEE 15th International Symposium on High Performance Computer Architecture. 239--249."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2010.2090914"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155659"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2011.6114182"},{"volume-title":"Proceedings of the Design, Automation & Test in Europe Conference & Exhibition.","author":"Tan J.","key":"e_1_2_1_41_1","unstructured":"J. Tan , T. Li , and X. Fu . 2015. Soft-error reliability and power co-optimization for GPGPUs register file using resistive memory . In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition. J. Tan, T. Li, and X. Fu. 2015. Soft-error reliability and power co-optimization for GPGPUs register file using resistive memory. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition."},{"key":"e_1_2_1_42_1","doi-asserted-by":"crossref","unstructured":"S. Tehrani. 2010. Status and Prospect for MRAM Technology.  S. Tehrani. 2010. Status and Prospect for MRAM Technology.","DOI":"10.1109\/HOTCHIPS.2010.7480057"},{"key":"e_1_2_1_43_1","unstructured":"S. Thoziyoor N. Muralimanohar J. H. Ahn and N. P. Jouppi. 2008. Cacti 5.1. HP Labs Tech. Rep.  S. Thoziyoor N. Muralimanohar J. H. Ahn and N. P. Jouppi. 2008. Cacti 5.1. HP Labs Tech. Rep."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2484762.2484774"},{"volume-title":"Proceedings of 31st Annual International Symposium on Computer Architecture (ISCA'04)","author":"Weaver C.","key":"e_1_2_1_45_1","unstructured":"C. Weaver , J. Emer , S. S. Mukherjee , and S. K. Reinhardt . 2004. Techniques to reduce the soft error rate of a high-performance microprocessor . In Proceedings of 31st Annual International Symposium on Computer Architecture (ISCA'04) . 264--275. C. Weaver, J. Emer, S. S. Mukherjee, and S. K. Reinhardt. 2004. Techniques to reduce the soft error rate of a high-performance microprocessor. In Proceedings of 31st Annual International Symposium on Computer Architecture (ISCA'04). 264--275."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1007\/11575467_8"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.36"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2827697","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2827697","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T05:43:23Z","timestamp":1750225403000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2827697"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,1,28]]},"references-count":46,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2016,1,28]]}},"alternative-id":["10.1145\/2827697"],"URL":"https:\/\/doi.org\/10.1145\/2827697","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"type":"print","value":"1084-4309"},{"type":"electronic","value":"1557-7309"}],"subject":[],"published":{"date-parts":[[2016,1,28]]},"assertion":[{"value":"2015-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-01-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}