{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T05:00:25Z","timestamp":1755838825410,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":57,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,6,26]],"date-time":"2019-06-26T00:00:00Z","timestamp":1561507200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,6,26]]},"DOI":"10.1145\/3330345.3330388","type":"proceedings-article","created":{"date-parts":[[2019,6,18]],"date-time":"2019-06-18T12:14:30Z","timestamp":1560860070000},"page":"484-496","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["BonVoision"],"prefix":"10.1145","author":[{"given":"Bo","family":"Fang","sequence":"first","affiliation":[{"name":"University of British Columbia"}]},{"given":"Hassan","family":"Halawa","sequence":"additional","affiliation":[{"name":"University of British Columbia"}]},{"given":"Karthik","family":"Pattabiraman","sequence":"additional","affiliation":[{"name":"University of British Columbia"}]},{"given":"Matei","family":"Ripeanu","sequence":"additional","affiliation":[{"name":"University of British Columbia"}]},{"given":"Sriram","family":"Krishnamoorthy","sequence":"additional","affiliation":[{"name":"Pacific Northwest National Laboratory"}]}],"member":"320","published-online":{"date-parts":[[2019,6,26]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"{n. d.}. International Technology Roadmap for Semiconductors.\/ Model for Assessment of CMOS Technologies and Roadmaps (MASTAR). http:\/\/www.itrs.net. Semiconductor Industries Association.  {n. d.}. International Technology Roadmap for Semiconductors.\/ Model for Assessment of CMOS Technologies and Roadmaps (MASTAR). http:\/\/www.itrs.net. Semiconductor Industries Association."},{"key":"e_1_3_2_1_2_1","unstructured":"{n. d.}. Samsung Electronics Develops World's First Eight-Die Multi-Chip Package for Multimedia Cell Phones. http:\/\/www.samsung.com. Samsung Electronics Corporation.  {n. d.}. Samsung Electronics Develops World's First Eight-Die Multi-Chip Package for Multimedia Cell Phones. http:\/\/www.samsung.com. Samsung Electronics Corporation."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807670"},{"volume-title":"2015 IEEE 17th International Conference on High Performance Computing and Communications.","author":"Bautista-Gomez L.","key":"e_1_3_2_1_4_1","unstructured":"L. Bautista-Gomez and F. Cappello . 2015. Exploiting Spatial Smoothness in HPC Applications to Detect Silent Data Corruption . In 2015 IEEE 17th International Conference on High Performance Computing and Communications. L. Bautista-Gomez and F. Cappello. 2015. Exploiting Spatial Smoothness in HPC Applications to Detect Silent Data Corruption. In 2015 IEEE 17th International Conference on High Performance Computing and Communications."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.3173"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00071"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"crossref","unstructured":"J. Chen S. Li and Z. Chen. 2016. GPU-ABFT: Optimizing Algorithm-Based Fault Tolerance for Heterogeneous Systems with GPUs. In NAS.  J. Chen S. Li and Z. Chen. 2016. GPU-ABFT: Optimizing Algorithm-Based Fault Tolerance for Heterogeneous Systems with GPUs. In NAS.","DOI":"10.1109\/NAS.2016.7549404"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","unstructured":"J. Chen X. Liang and Z. Chen. {n. d.}. Online Algorithm-Based Fault Tolerance for Cholesky Decomposition on Heterogeneous Systems with GPUs. In 2016 IPDPS.  J. Chen X. Liang and Z. Chen. {n. d.}. Online Algorithm-Based Fault Tolerance for Cholesky Decomposition on Heterogeneous Systems with GPUs. In 2016 IPDPS.","DOI":"10.1109\/IPDPS.2016.81"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2015.05.187"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/Co-HPC.2014.6"},{"key":"e_1_3_2_1_12_1","unstructured":"P. Cicotti S. M. Mniszewski and L. Carrington. 2013. CoMD: A Classical Molecular Dynamics Mini-app. http:\/\/exmatex.github.io\/CoMD\/doxygen-mpi\/index.html  P. Cicotti S. M. Mniszewski and L. Carrington. 2013. CoMD: A Classical Molecular Dynamics Mini-app. http:\/\/exmatex.github.io\/CoMD\/doxygen-mpi\/index.html"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2014.1"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"crossref","unstructured":"J. T. Daly. 2006. A Higher Order Estimate of the Optimum Checkpoint Interval for Restart Dumps. Future Gener. Comput. Syst. (2006).   J. T. Daly. 2006. A Higher Order Estimate of the Optimum Checkpoint Interval for Restart Dumps. Future Gener. Comput. Syst. (2006).","DOI":"10.1016\/j.future.2004.11.016"},{"key":"e_1_3_2_1_15_1","unstructured":"Timothy J. Dell. 1997. A White Paper on the Benefits of Chipkill-Correct ECC for PC Server Main Memory by.  Timothy J. Dell. 1997. A White Paper on the Benefits of Chipkill-Correct ECC for PC Server Main Memory by."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"N. El-Sayed and B. Schroeder. 2014. Checkpoint\/restart in practice: When simple is better;. In 2014 Cluster.  N. El-Sayed and B. Schroeder. 2014. Checkpoint\/restart in practice: When simple is better;. In 2014 Cluster.","DOI":"10.1109\/CLUSTER.2014.6968777"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Alexandra Poulos et al. 2018. Improving Application Resilience by Extending Error Correction with Contextual Information. In IEEE\/ACM 8th FTXS@SC 2018.  Alexandra Poulos et al. 2018. Improving Application Resilience by Extending Error Correction with Contextual Information. In IEEE\/ACM 8th FTXS@SC 2018.","DOI":"10.1109\/FTXS.2018.00006"},{"volume-title":"ACM\/IEEE 2002 Conference.","author":"G. Bosilca","key":"e_1_3_2_1_18_1","unstructured":"G. Bosilca et al. {n. d.}. MPICH-V : Toward a Scalable Fault Tolerant MPI for Volatile Nodes. In Supercomputing , ACM\/IEEE 2002 Conference. G. Bosilca et al. {n. d.}. MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes. In Supercomputing, ACM\/IEEE 2002 Conference."},{"volume-title":"Towards Predicting the Impact of Roll-Forward Failure Recovery for HPC Applications. In DSN 2019 (Fast-abstract). IEEE.","author":"Fang B.","key":"e_1_3_2_1_20_1","unstructured":"B. Fang , J. Chen , K. Pattabiraman , M. Ripeanu , and S. Krishnamoorthy . {n. d.} . Towards Predicting the Impact of Roll-Forward Failure Recovery for HPC Applications. In DSN 2019 (Fast-abstract). IEEE. B. Fang, J. Chen, K. Pattabiraman, M. Ripeanu, and S. Krishnamoorthy. {n. d.}. Towards Predicting the Impact of Roll-Forward Failure Recovery for HPC Applications. In DSN 2019 (Fast-abstract). IEEE."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078597.3078609"},{"volume-title":"2016 46th Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN). 168--179","author":"Fang B.","key":"e_1_3_2_1_22_1","unstructured":"B. Fang , Q. Lu , K. Pattabiraman , M. Ripeanu , and S. Gurumurthi . 2016. ePVF: An Enhanced Program Vulnerability Factor Methodology for Cross-Layer Resilience Analysis . In 2016 46th Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN). 168--179 . B. Fang, Q. Lu, K. Pattabiraman, M. Ripeanu, and S. Gurumurthi. 2016. ePVF: An Enhanced Program Vulnerability Factor Methodology for Cross-Layer Resilience Analysis. In 2016 46th Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN). 168--179."},{"key":"e_1_3_2_1_23_1","unstructured":"Mark Gottscho. {n. d.}. Opportunistic Memory Systems in Presence of Hardware Variability. ({n. d.}).  Mark Gottscho. {n. d.}. Opportunistic Memory Systems in Presence of Hardware Variability. ({n. d.})."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"crossref","unstructured":"M. Gottscho C. Schoeny L. Dolecek and P. Gupta. {n. d.}. Software-Defined Error-Correcting Codes. In 2016 DSN-W.  M. Gottscho C. Schoeny L. Dolecek and P. Gupta. {n. d.}. Software-Defined Error-Correcting Codes. In 2016 DSN-W.","DOI":"10.1109\/DSN-W.2016.67"},{"key":"e_1_3_2_1_25_1","unstructured":"Shinsuke Hamada Soramichi Akiyama and Mitaro Namiki. {n. d.}. Reactive NaN Repair for Applying Approximate Memory to Numerical Applications. CoRR ({n. d.}).  Shinsuke Hamada Soramichi Akiyama and Mitaro Namiki. {n. d.}. Reactive NaN Repair for Applying Approximate Memory to Numerical Applications. CoRR ({n. d.})."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1127577.1127590"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1984.1676475"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"crossref","unstructured":"I.Karlin. 2012. LULESH Programming Model and Performance Ports Overview. https:\/\/codesign.llnl.gov\/pdfs\/lulesh_Ports.pdf  I.Karlin. 2012. LULESH Programming Model and Performance Ports Overview. https:\/\/codesign.llnl.gov\/pdfs\/lulesh_Ports.pdf","DOI":"10.2172\/1059462"},{"key":"e_1_3_2_1_29_1","unstructured":"Andi Kleen. {n. d.}. mcelog: memory error handling in user space.  Andi Kleen. {n. d.}. mcelog: memory error handling in user space."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2384616.2384648"},{"key":"e_1_3_2_1_31_1","unstructured":"Los Alamos National Laboratory. 2016. The PENNANT Mini-App v0.9. https:\/\/github.com\/losalamos\/PENNANT  Los Alamos National Laboratory. 2016. The PENNANT Mini-App v0.9. https:\/\/github.com\/losalamos\/PENNANT"},{"volume-title":"SC '16","author":"Levy Scott","key":"e_1_3_2_1_32_1","unstructured":"Scott Levy , Kurt B. Ferreira , and Patrick G. Bridges . 2016. Improving Application Resilience to Memory Errors with Lightweight Compression . In SC '16 . Scott Levy, Kurt B. Ferreira, and Patrick G. Bridges. 2016. Improving Application Resilience to Memory Errors with Lightweight Compression. In SC '16."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00022"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063445"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126915"},{"key":"e_1_3_2_1_36_1","first-page":"3","article-title":"Flikker","volume":"46","author":"Liu Song","year":"2011","unstructured":"Song Liu , Karthik Pattabiraman , Thomas Moscibroda , and Benjamin G. Zorn . 2011 . Flikker : Saving DRAM Refresh-power Through Critical Data Partitioning. SIGPLAN Not. 46 , 3 (March 2011), 213--224. Song Liu, Karthik Pattabiraman, Thomas Moscibroda, and Benjamin G. Zorn. 2011. Flikker: Saving DRAM Refresh-power Through Critical Data Partitioning. SIGPLAN Not. 46, 3 (March 2011), 213--224.","journal-title":"Saving DRAM Refresh-power Through Critical Data Partitioning. SIGPLAN Not."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2666356.2594337"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2014.62"},{"key":"e_1_3_2_1_39_1","volume-title":"Michalak","author":"Sarah E.","year":"2014","unstructured":"Sarah E. et al. Michalak . 2014 . Correctness Field Testing of Production and Decommissioned High Performance Computing Platforms at Los Alamos National Laboratory (SC '14). Sarah E. et al. Michalak. 2014. Correctness Field Testing of Production and Decommissioned High Performance Computing Platforms at Los Alamos National Laboratory (SC '14)."},{"key":"e_1_3_2_1_40_1","volume-title":"Proceedings of the 24th USENIX Conference on Security Symposium (SEC'15)","author":"Ming Jiang","year":"2015","unstructured":"Jiang Ming , Dinghao Wu , Gaoyao Xiao , Jun Wang , and Peng Liu . 2015 . TaintPipe: Pipelined Symbolic Taint Analysis . In Proceedings of the 24th USENIX Conference on Security Symposium (SEC'15) . USENIX Association, Berkeley, CA, USA, 65--80. http:\/\/dl.acm.org\/citation.cfm?id=2831143.2831148 Jiang Ming, Dinghao Wu, Gaoyao Xiao, Jun Wang, and Peng Liu. 2015. TaintPipe: Pipelined Symbolic Taint Analysis. In Proceedings of the 24th USENIX Conference on Security Symposium (SEC'15). USENIX Association, Berkeley, CA, USA, 65--80. http:\/\/dl.acm.org\/citation.cfm?id=2831143.2831148"},{"volume-title":"Particle-based Fluid Simulation for Interactive Applications. In SCA '03","author":"M\u00fcller Matthias","key":"e_1_3_2_1_41_1","unstructured":"Matthias M\u00fcller , David Charypar , and Markus Gross . {n. d.}. Particle-based Fluid Simulation for Interactive Applications. In SCA '03 . Matthias M\u00fcller, David Charypar, and Markus Gross. {n. d.}. Particle-based Fluid Simulation for Interactive Applications. In SCA '03."},{"key":"e_1_3_2_1_42_1","unstructured":"D. Nicholaeff N. Davis D. Trujillo and R. W. Robey. {n. d.}. Cell-Based Adaptive Mesh Refinement Implemented with General Purpose Graphics Processing Units. ({n. d.}).  D. Nicholaeff N. Davis D. Trujillo and R. W. Robey. {n. d.}. Cell-Based Adaptive Mesh Refinement Implemented with General Purpose Graphics Processing Units. ({n. d.})."},{"key":"e_1_3_2_1_43_1","volume-title":"Single event upset at ground level","author":"Normand E.","year":"1996","unstructured":"E. Normand . 1996. Single event upset at ground level . IEEE Transactions on Nuclear Science ( 1996 ). E. Normand. 1996. Single event upset at ground level. IEEE Transactions on Nuclear Science (1996)."},{"key":"e_1_3_2_1_44_1","unstructured":"H. et al. Patil. {n. d.}. Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation. In MICRO-37.  H. et al. Patil. {n. d.}. Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation. In MICRO-37."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CSAC.2004.2"},{"key":"e_1_3_2_1_46_1","unstructured":"Martin et al. Rinard. {n. d.}. Enhancing Server Availability and Security Through Failure-oblivious Computing (OSDI'04).  Martin et al. Rinard. {n. d.}. Enhancing Server Availability and Security Through Failure-oblivious Computing ( OSDI'04 )."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/1993498.1993518"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"crossref","unstructured":"C. Schoeny F. Sala M. Gottscho I. Alam P. Gupta and L. Dolecek. 2017. Context-aware resiliency: Unequal message protection for random-access memories. In ITW'17.  C. Schoeny F. Sala M. Gottscho I. Alam P. Gupta and L. Dolecek. 2017. Context-aware resiliency: Unequal message protection for random-access memories. In ITW'17.","DOI":"10.1109\/ITW.2017.8278033"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304588"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"crossref","unstructured":"J. Sloan R. Kumar and G. Bronevetsky. 2012. Algorithmic approaches to low overhead fault detection for sparse linear algebra. In DSN.   J. Sloan R. Kumar and G. Bronevetsky. 2012. Algorithmic approaches to low overhead fault detection for sparse linear algebra. In DSN.","DOI":"10.1109\/DSN.2012.6263938"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2775054.2694348"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542476.1542526"},{"volume-title":"Proceedings of the 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 786--796","author":"Tan Li","key":"e_1_3_2_1_53_1","unstructured":"Li Tan , Shuaiwen Leon Song , Panruo Wu , Zizhong Chen , Rong Ge , and Darren J. Kerbyson . 2015. Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing . In Proceedings of the 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 786--796 . https:\/\/doi.org\/xpl\/articleDetails.jsp?arnumber=7161565 Li Tan, Shuaiwen Leon Song, Panruo Wu, Zizhong Chen, Rong Ge, and Darren J. Kerbyson. 2015. Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing. In Proceedings of the 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 786--796. https:\/\/doi.org\/xpl\/articleDetails.jsp?arnumber=7161565"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS.2010.48"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.5555\/942806.943827"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2014.2"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3018743.3018750"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2907294.2907315"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/361147.361115"}],"event":{"name":"ICS '19: 2019 International Conference on Supercomputing","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture"],"location":"Phoenix Arizona","acronym":"ICS '19"},"container-title":["Proceedings of the ACM International Conference on Supercomputing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3330345.3330388","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3330345.3330388","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:53:26Z","timestamp":1750204406000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3330345.3330388"}},"subtitle":["leveraging spatial data smoothness for recovery from memory soft errors"],"short-title":[],"issued":{"date-parts":[[2019,6,26]]},"references-count":57,"alternative-id":["10.1145\/3330345.3330388","10.1145\/3330345"],"URL":"https:\/\/doi.org\/10.1145\/3330345.3330388","relation":{},"subject":[],"published":{"date-parts":[[2019,6,26]]},"assertion":[{"value":"2019-06-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}