{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T23:42:46Z","timestamp":1764027766230,"version":"build-2065373602"},"reference-count":50,"publisher":"MDPI AG","issue":"24","license":[{"start":{"date-parts":[[2021,12,10]],"date-time":"2021-12-10T00:00:00Z","timestamp":1639094400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>With technology scaling, maintaining the reliability of dynamic random-access memory (DRAM) has become more challenging. Therefore, on-die error correction codes have been introduced to accommodate reliability issues in DDR5. However, the current solution still suffers from high overhead when a large DRAM capacity is used to deliver high performance. We present a DRAM chip architecture that can track faults at byte-level DRAM cell errors to address this problem. DRAM faults are classified as temporary or permanent in our proposed architecture, with no additional pins and with minor DRAM chip modifications. Hence, we achieve reliability comparable to that of other state-of-the-art solutions while incurring negligible performance and energy overhead. Furthermore, the faulty locations are efficiently exposed to the operating system (OS). Thus, we can significantly reduce the required scrubbing cycle by scrubbing only faulty DRAM pages while reducing the system failure probability up to 5000\u223c7000 times relative to conventional operation.<\/jats:p>","DOI":"10.3390\/s21248271","type":"journal-article","created":{"date-parts":[[2021,12,10]],"date-time":"2021-12-10T08:17:58Z","timestamp":1639124278000},"page":"8271","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["OBET: On-the-Fly Byte-Level Error Tracking for Correcting and Detecting Faults in Unreliable DRAM Systems"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3029-4268","authenticated-orcid":false,"given":"Duy-Thanh","family":"Nguyen","sequence":"first","affiliation":[{"name":"Department of Electronic Engineering, Kyung Hee University, Yongin-si 17104, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3864-8027","authenticated-orcid":false,"given":"Nhut-Minh","family":"Ho","sequence":"additional","affiliation":[{"name":"Department of Computer Science, National University of Singapore, Singapore 117418, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4281-2053","authenticated-orcid":false,"given":"Weng-Fai","family":"Wong","sequence":"additional","affiliation":[{"name":"Department of Computer Science, National University of Singapore, Singapore 117418, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8871-8695","authenticated-orcid":false,"given":"Ik-Joon","family":"Chang","sequence":"additional","affiliation":[{"name":"Department of Electronic Engineering, Kyung Hee University, Yongin-si 17104, Korea"}]}],"member":"1968","published-online":{"date-parts":[[2021,12,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1145\/2678373.2665726","article-title":"Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors","volume":"42","author":"Kim","year":"2014","journal-title":"ACM Sigarch Comput. Archit. News"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1145\/2954679.2872390","article-title":"ANVIL: Software-based protection against next-generation rowhammer attacks","volume":"51","author":"Aweke","year":"2016","journal-title":"ACM Sigplan Not."},{"key":"ref_3","unstructured":"Brasser, F., Davi, L., Gens, D., Liebchen, C., and Sadeghi, A.R. (2017, January 16\u201318). CAn\u2019t touch this: Software-only mitigation against Rowhammer attacks targeting kernel memory. Proceedings of the 26th USENIX Security Symposium (USENIX Security 17), Vancouver, BC, Canada."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1109\/55.843160","article-title":"Geometric effect of multiple-bit soft errors induced by cosmic ray neutrons on DRAM\u2019s","volume":"21","author":"Satoh","year":"2000","journal-title":"IEEE Electron Device Lett."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1109\/4.658626","article-title":"Cosmic ray soft error rates of 16-Mb DRAM memory chips","volume":"33","author":"Ziegler","year":"1998","journal-title":"IEEE J. -Solid-State Circuits"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"4684","DOI":"10.1109\/TCSI.2020.3018328","article-title":"Novel Speed-and-Power-Optimized SRAM Cell Designs With Enhanced Self-Recoverability From Single-and Double-Node Upsets","volume":"67","author":"Yan","year":"2020","journal-title":"IEEE Trans. Circuits Syst. I Regul. Pap."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Yan, A., Fan, Z., Ding, L., Cui, J., Huang, Z., Wang, Q., Zheng, H., Girard, P., and Wen, X. (2021). Cost-Effective and Highly Reliable Circuit Components Design for Safety-Critical Applications. IEEE Trans. Aerosp. Electron. Syst., in press.","DOI":"10.1109\/TAES.2021.3103586"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1109\/TETC.2018.2871861","article-title":"Novel low cost, double-and-triple-node-upset-tolerant latch designs for nano-scale CMOS","volume":"9","author":"Yan","year":"2018","journal-title":"IEEE Trans. Emerg. Top. Comput."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Shah, A.P., and Waltl, M. (2020). Bias temperature instability aware and soft error tolerant radiation hardened 10T SRAM cell. Electronics, 9.","DOI":"10.3390\/electronics9020256"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1109\/TC.2020.2966200","article-title":"Information Assurance Through Redundant Design: A Novel TNU Error-Resilient Latch for Harsh Radiation Environment","volume":"69","author":"Yan","year":"2020","journal-title":"IEEE Trans. Comput."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Jiang, J., Zhu, W., Xiao, J., and Zou, S. (2018). A novel high-performance low-cost double-upset tolerant latch design. Electronics, 7.","DOI":"10.3390\/electronics7100247"},{"key":"ref_12","unstructured":"IBM (2021, November 05). Chipkill Memory. Available online: http:\/\/ps-2.kev009.com\/pccbbs\/pc_servers\/chipkilf.pdf."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Park, S.K. (2015, January 17\u201320). Technology scaling challenge and future prospects of DRAM and NAND flash memory. Proceedings of the 2015 IEEE International Memory Workshop (IMW), Monterey, CA, USA.","DOI":"10.1109\/IMW.2015.7150307"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1145\/2248487.2150989","article-title":"Cosmic rays don\u2019t strike twice: Understanding the nature of DRAM errors and the implications for system design","volume":"47","author":"Hwang","year":"2012","journal-title":"ACM Sigplan Not."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"553","DOI":"10.1109\/16.278509","article-title":"The effect of cosmic rays on the soft error rate of a DRAM at ground level","volume":"41","year":"1994","journal-title":"IEEE Trans. Electron Devices"},{"key":"ref_16","unstructured":"McKee, W., McAdams, H., Smith, E., McPherson, J., Janzen, J., Ondrusek, J., Hyslop, A., Russell, D., Coy, R., and Bergman, D. (May, January 30). Cosmic ray neutron induced upsets as a major contributor to the soft error rate of current and future generation DRAMs. Proceedings of the International Reliability Physics Symposium, Dallas, TX, USA."},{"key":"ref_17","unstructured":"JEDEC (2021, November 05). JESD209-4. Available online: https:\/\/www.jedec.org\/document_search?search_api_views_fulltext=JESD209-4."},{"key":"ref_18","unstructured":"JEDEC (2021, November 05). DDR5 SDRAM Standard. Available online: https:\/\/www.jedec.org\/standards-documents\/docs\/jesd79-5a."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Nair, P.J., Sridharan, V., and Qureshi, M.K. (2016, January 18\u201322). XED: Exposing on-die error detection information for strong memory reliability. Proceedings of the 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Seoul, Korea.","DOI":"10.1109\/ISCA.2016.38"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Gong, S.L., Kim, J., Lym, S., Sullivan, M., David, H., and Erez, M. (2018, January 24\u201328). Duo: Exposing on-chip redundancy to rank-level ecc for high reliability. Proceedings of the 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), Vienna, Austria.","DOI":"10.1109\/HPCA.2018.00064"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Cha, S., Seongil, O., Shin, H., Hwang, S., Park, K., Jang, S.J., Choi, J.S., Jin, G.Y., Son, Y.H., and Cho, H. (2017, January 4\u20138). Defect analysis and cost-effective resilience architecture for future DRAM devices. Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), Austin, TX, USA.","DOI":"10.1109\/HPCA.2017.30"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1109\/JSSC.2014.2353799","article-title":"A 3.2 gbps\/pin 8 gbit 1.0 v lpddr4 sdram with integrated ecc engine for sub-1 v dram core operation","volume":"50","author":"Oh","year":"2014","journal-title":"IEEE J. Solid State Circuits"},{"key":"ref_23","unstructured":"Lee, C.J., Narasiman, V., Ebrahimi, E., Mutlu, O., and Patt, Y.N. (2010). DRAM-Aware Last-Level Cache Writeback: Reducing Write-Caused Interference in Memory Systems, Carnegie Mellon University. HPS Technical Report, TR-HPS-2010-002."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1145\/1816038.1815972","article-title":"The virtual write queue: Coordinating DRAM and last-level cache policies","volume":"38","author":"Stuecheli","year":"2010","journal-title":"ACM Sigarch Comput. Archit. News"},{"key":"ref_25","unstructured":"JEDEC (2021, November 05). DDR4 SDRAM Standard. Available online: https:\/\/www.jedec.org\/document_search?search_api_views_fulltext=JESD79-4D."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1145\/2786763.2694348","article-title":"Memory errors in modern systems: The good, the bad, and the ugly","volume":"43","author":"Sridharan","year":"2015","journal-title":"ACM Sigarch Comput. Archit. News"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Sridharan, V., and Liberty, D. (2012, January 10\u201316). A study of DRAM failures in the field. Proceedings of the SC\u201912: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT, USA.","DOI":"10.1109\/SC.2012.13"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1145\/1036474.1036498","article-title":"Automating software failure reporting","volume":"2","author":"Murphy","year":"2004","journal-title":"Queue"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Frigo, P., Vannacc, E., Hassan, H., Van Der Veen, V., Mutlu, O., Giuffrida, C., Bos, H., and Razavi, K. (2020, January 18\u201321). TRRespass: Exploiting the many sides of target row refresh. Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA.","DOI":"10.1109\/SP40000.2020.00090"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Kim, J.S., Patel, M., Ya\u011fl\u0131k\u00e7\u0131, A.G., Hassan, H., Azizi, R., Orosa, L., and Mutlu, O. (June, January 30). Revisiting rowhammer: An experimental analysis of modern dram devices and mitigation techniques. Proceedings of the 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain.","DOI":"10.1109\/ISCA45697.2020.00059"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"114406","DOI":"10.1016\/j.microrel.2021.114406","article-title":"Neutron-induced effects on a self-refresh DRAM","volume":"128","author":"Luza","year":"2022","journal-title":"Microelectron. Reliab."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Patel, M., Kim, J.S., Shahroodi, T., Hassan, H., and Mutlu, O. (2020, January 17\u201321). Bit-exact ecc recovery (BEER): Determining DRAM on-die ECC functions by exploiting DRAM data retention characteristics. Proceedings of the 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO), Athens, Greece.","DOI":"10.1109\/MICRO50266.2020.00034"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Son, Y.H., Lee, S., Seongil, O., Kwon, S., Kim, N.S., and Ahn, J.H. (2015, January 7\u201311). CiDRA: A cache-inspired DRAM resilience architecture. Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), Burlingame, CA, USA.","DOI":"10.1109\/HPCA.2015.7056058"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Udipi, A.N., Muralimanohar, N., Balsubramonian, R., Davis, A., and Jouppi, N.P. (2012, January 9\u201313). LOT-ECC: Localized and tiered reliability mechanisms for commodity memory systems. Proceedings of the 2012 39th Annual International Symposium on Computer Architecture (ISCA), Portland, OR, USA.","DOI":"10.1109\/ISCA.2012.6237025"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Jeong, S., Kang, S., and Yang, J.S. (2020, January 20\u201324). PAIR: Pin-aligned In-DRAM ECC architecture using expandability of Reed-Solomon code. Proceedings of the 2020 57th ACM\/IEEE Design Automation Conference (DAC), San Francisco, CA, USA.","DOI":"10.1109\/DAC18072.2020.9218745"},{"key":"ref_36","unstructured":"Kleen, A. (2010, January 21\u201324). Mcelog: Memory error handling in user space. Proceedings of the International Linux System Technology Conference (Linux Kongress), Nuremberg (N\u00fcrnberg), Germany."},{"key":"ref_37","unstructured":"MICRON (2021, November 05). DDR5 16GB Die Datasheet. Available online: https:\/\/media-www.micron.com\/-\/media\/client\/global\/documents\/products\/data-sheet\/dram\/ddr5\/16gb_ddr5_sdram_diereva.pdf."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1145\/1816038.1815980","article-title":"Use ECP, not ECC, for hard failures in resistive memories","volume":"38","author":"Schechter","year":"2010","journal-title":"ACM Sigarch Comput. Archit. News"},{"key":"ref_39","unstructured":"Tang, D., Carruthers, P., Totari, Z., and Shapiro, M.W. (2006, January 25\u201328). Assessment of the effect of memory page retirement on system RAS against hardware faults. Proceedings of the International Conference on Dependable Systems and Networks (DSN\u201906), Philadelphia, PA, USA."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1147\/rd.144.0395","article-title":"A class of optimal minimum odd-weight-column SEC-DED codes","volume":"14","author":"Hsiao","year":"1970","journal-title":"IBM J. Res. Dev."},{"key":"ref_41","unstructured":"(2021, November 05). BIOS and Kernel Developer\u2019s Guide for AMD NPT Family 0Fh Processors Publication. Available online: https:\/\/www.amd.com\/system\/files\/TechDocs\/32559.pdf."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1109\/TR.1975.5215337","article-title":"Determination of reliability using event-based Monte Carlo simulation","volume":"24","author":"Kamat","year":"1975","journal-title":"IEEE Trans. Reliab."},{"key":"ref_43","first-page":"1","article-title":"Faultsim: A fast, configurable memory-reliability simulator for conventional and 3d-stacked systems","volume":"12","author":"Nair","year":"2015","journal-title":"ACM Trans. Archit. Code Optim. (TACO)"},{"key":"ref_44","unstructured":"Henning, J. (2021, November 05). Standard Performance Evaluation Corporation (SPEC). Available online: https:\/\/www.spec.org\/cpu2006\/."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2024716.2024718","article-title":"The gem5 simulator","volume":"39","author":"Binkert","year":"2011","journal-title":"ACM Sigarch Comput. Archit. News"},{"key":"ref_46","unstructured":"Intel (2021, November 05). Intel D1649N. Available online: https:\/\/www.intel.com\/content\/www\/us\/en\/products\/sku\/193696\/intel-xeon-d1649n-processor-12m-cache-2-30ghz\/specifications.html."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Jagtap, R., Jung, M., Elsasser, W., Weis, C., Hansson, A., and Wehn, N. (2017, January 2). Integrating DRAM power-down modes in gem5 and quantifying their impact. Proceedings of the International Symposium on Memory Systems, Washington, DC, USA.","DOI":"10.1145\/3132402.3132444"},{"key":"ref_48","unstructured":"Xilinx (2021, November 05). Xilinx Logic Core Reed-Solomon Decoder 9.0. Available online: https:\/\/www.xilinx.com\/support\/documentation\/ip_documentation\/rs_decoder\/v9_0\/pg107-rs-decoder.pdf."},{"key":"ref_49","unstructured":"Xilinx (2021, November 05). Xilinx Logic Core Reed-Solomon Encoder 9.0. Available online: https:\/\/www.xilinx.com\/support\/documentation\/ip_documentation\/rs_encoder\/v9_0\/pg025_rs_encoder.pdf."},{"key":"ref_50","unstructured":"MICRON (2021, November 05). DDR4 Power Calculation. Available online: https:\/\/www.micron.com\/-\/media\/client\/global\/documents\/products\/technical-note\/dram\/tn4007_ddr4_power_calculation.pdf."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/24\/8271\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:45:04Z","timestamp":1760168704000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/24\/8271"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,10]]},"references-count":50,"journal-issue":{"issue":"24","published-online":{"date-parts":[[2021,12]]}},"alternative-id":["s21248271"],"URL":"https:\/\/doi.org\/10.3390\/s21248271","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2021,12,10]]}}}