{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,12,30]],"date-time":"2022-12-30T19:27:52Z","timestamp":1672428472877},"reference-count":40,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2007,3]]},"abstract":"<jats:p>\n            As silicon technologies move into the nanometer regime, transistor reliability is expected to wane as devices become subject to extreme process variation, particle-induced transient errors, and transistor wear-out. Unless these challenges are addressed, computer vendors can expect low yields and short mean-times-to-failure. In this article, we examine the challenges of designing complex computing systems in the presence of transient and permanent faults. We select one small aspect of a typical chip multiprocessor (CMP) system to study in detail, a single CMP router switch. Our goal is to design a\n            <jats:italic>BulletProof<\/jats:italic>\n            CMP switch architecture capable of tolerating significant levels of various types of defects. We first assess the vulnerability of the CMP switch to transient faults. To better understand the impact of these faults, we evaluate our CMP switch designs using circuit-level timing on detailed physical layouts. Our infrastructure represents a new level of fidelity in architectural-level fault analysis, as we can accurately track faults as they occur, noting whether they manifest or not, because of masking in the circuits, logic, or architecture. Our experimental results are quite illuminating. We find that transient faults, because of their fleeting nature, are of little concern for our CMP switch, even within large switch fabrics with fast clocks. Next, we develop a unified model of permanent faults, based on the time-tested bathtub curve. Using this convenient abstraction, we analyze the reliability versus area tradeoff across a wide spectrum of CMP switch designs, ranging from unprotected designs to fully protected designs with on-line repair and recovery capabilities. Protection is considered at multiple levels from the entire system down through arbitrary partitions of the design. We find that designs are attainable that can tolerate a larger number of defects with less overhead than na\u00efve triple-modular redundancy, using domain-specific techniques, such as end-to-end error detection, resource sparing, automatic circuit decomposition, and iterative diagnosis and reconfiguration.\n          <\/jats:p>","DOI":"10.1145\/1216544.1216545","type":"journal-article","created":{"date-parts":[[2007,4,5]],"date-time":"2007-04-05T19:20:08Z","timestamp":1175800808000},"page":"2","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Architecting a reliable CMP switch architecture"],"prefix":"10.1145","volume":"4","author":[{"given":"Kypros","family":"Constantinides","sequence":"first","affiliation":[{"name":"University of Michigan, Ann Arbor, MI"}]},{"given":"Stephen","family":"Plaza","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI"}]},{"given":"Jason","family":"Blome","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI"}]},{"given":"Valeria","family":"Bertacco","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI"}]},{"given":"Scott","family":"Mahlke","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI"}]},{"given":"Todd","family":"Austin","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI"}]},{"given":"Bin","family":"Zhang","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, TX"}]},{"given":"Michael","family":"Orshansky","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, TX"}]}],"member":"320","published-online":{"date-parts":[[2007,3]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"International Conference on Computer Aided Design (ICCAD '95)","author":"Al-Asaad H.","unstructured":"Al-Asaad , H. and Hayes , J. P . 1995. Design verification via simulation and automatic test pattern generation . In International Conference on Computer Aided Design (ICCAD '95) . IEEE Computer Society Press, Washington, D.C. 174--180. Al-Asaad, H. and Hayes, J. P. 1995. Design verification via simulation and automatic test pattern generation. In International Conference on Computer Aided Design (ICCAD '95). IEEE Computer Society Press, Washington, D.C. 174--180."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of International Test Conference (ITC). 77--84","author":"Barnett T. S.","unstructured":"Barnett , T. S. and Singh , A. D . 2003. Relating yield models to burn-in fall-out in time . In Proceedings of International Test Conference (ITC). 77--84 . Barnett, T. S. and Singh, A. D. 2003. Relating yield models to burn-in fall-out in time. In Proceedings of International Test Conference (ITC). 77--84."},{"key":"e_1_2_1_3_1","volume-title":"International Test Conference (ITC '97)","author":"Bohl E.","unstructured":"Bohl , E. , Lindenkreuz , T. , and Stephan , R . 1997. The fail-stop controller AE11 . In International Test Conference (ITC '97) . IEEE, Washington, D.C. 567--577. Bohl, E., Lindenkreuz, T., and Stephan, R. 1997. The fail-stop controller AE11. In International Test Conference (ITC '97). IEEE, Washington, D.C. 567--577."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/996566.996588"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.461.0077"},{"key":"e_1_2_1_6_1","volume-title":"Proc. of International Conference on Dependable Systems and Networks (DSN). 51--60","author":"Bower F. A.","unstructured":"Bower , F. A. , Shealy , P. G. , Ozev , S. , and Sorin , D. J . 2004. Tolerating hard faults in microprocessor array structures . In Proc. of International Conference on Dependable Systems and Networks (DSN). 51--60 . Bower, F. A., Shealy, P. G., Ozev, S., and Sorin, D. J. 2004. Tolerating hard faults in microprocessor array structures. In Proc. of International Conference on Dependable Systems and Networks (DSN). 51--60."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of International Workshop on Parallel Computer Routing and Communication (PCRCW). 241--255","author":"Dally W. J.","unstructured":"Dally , W. J. , Dennison , L. R. , Harris , D. , Kan , K. , and Xanthopoulos , T . 1994. The reliable router: A reliable and high-performance communication substrate for parallel computers . In Proceedings of International Workshop on Parallel Computer Routing and Communication (PCRCW). 241--255 . Dally, W. J., Dennison, L. R., Harris, D., Kan, K., and Xanthopoulos, T. 1994. The reliable router: A reliable and high-performance communication substrate for parallel computers. In Proceedings of International Workshop on Parallel Computer Routing and Communication (PCRCW). 241--255."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/859618.859631"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/996070.1009962"},{"key":"e_1_2_1_10_1","volume-title":"International Electron Devices Meeting.","author":"Hu C. K.","unstructured":"Hu , C. K. and Rosenberg , R . 1999. Scaling the effect on electromigration in on-chip CU wiring . In International Electron Devices Meeting. Hu, C. K. and Rosenberg, R. 1999. Scaling the effect on electromigration in on-chip CU wiring. In International Electron Devices Meeting."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/513918.513943"},{"key":"e_1_2_1_12_1","unstructured":"ITRS. 2004. International technology roadmap for semiconductors (ITRS) 2004 update. Document available at http:\/\/public.itrs.net\/.  ITRS. 2004. International technology roadmap for semiconductors (ITRS) 2004 update. Document available at http:\/\/public.itrs.net\/."},{"key":"e_1_2_1_13_1","unstructured":"J. E. D. E. Council. 2002. Failure mechanisms and models for semiconductor devices. JEDEC Publication JEP122-A.  J. E. D. E. Council. 2002. Failure mechanisms and models for semiconductor devices. JEDEC Publication JEP122-A."},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Karypis G. and Kumar V. 1998. hMETIS: A hypergraph partitioning package.  Karypis G. and Kumar V. 1998. hMETIS: A hypergraph partitioning package.","DOI":"10.1145\/266021.266273"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/266021.266273"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2004.119"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 29th International Symposium on Computer Architecture (ISCA-02)","author":"Mukherjee S. S.","unstructured":"Mukherjee , S. S. , Kontz , M. , and Reinhardt , S. K . 2002. Detailed design and implementation of redundant multithreading alternatives . In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-02) . 99--110. Mukherjee, S. S., Kontz, M., and Reinhardt, S. K. 2002. Detailed design and implementation of redundant multithreading alternatives. In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-02). 99--110."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings International Symposium on Microarchitecture (MICRO). 29--42","author":"Mukherjee S. S.","unstructured":"Mukherjee , S. S. , Weaver , C. , Emer , J. S. , Reinhardt , S. K. , and Austin , T. M . 2003. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor . In Proceedings International Symposium on Microarchitecture (MICRO). 29--42 . Mukherjee, S. S., Weaver, C., Emer, J. S., Reinhardt, S. K., and Austin, T. M. 2003. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In Proceedings International Symposium on Microarchitecture (MICRO). 29--42."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2005.37"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.544235"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/871506.871530"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339652"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of International Symposium on Circuits and Systems (ISCAS). 377--380","author":"Riess B. M.","unstructured":"Riess , B. M. and Ettelt , G. G . 1995. Speed: Fast and efficient timing driven placement . In Proceedings of International Symposium on Circuits and Systems (ISCAS). 377--380 . Riess, B. M. and Ettelt, G. G. 1995. Speed: Fast and efficient timing driven placement. In Proceedings of International Symposium on Circuits and Systems (ISCAS). 377--380."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/795672.796966"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/859618.859667"},{"key":"e_1_2_1_27_1","volume-title":"Dependable adaptive computing systems","author":"Saxena N.","unstructured":"Saxena , N. and McCluskey , E. 1998. Dependable adaptive computing systems . In IEEE Systems, Man , and Cybernetics Conference . Saxena, N. and McCluskey, E. 1998. Dependable adaptive computing systems. In IEEE Systems, Man, and Cybernetics Conference."},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the International Conference on Dependable Systems and Networks (DSN). 389--398","author":"Shivakumar P.","unstructured":"Shivakumar , P. , Kistler , M. , Keckler , S. W. , Burger , D. , and Alvisi , L . 2002. Modeling the effect of technology trends on the soft error rate of combinational logic . In Proceedings of the International Conference on Dependable Systems and Networks (DSN). 389--398 . Shivakumar, P., Kistler, M., Keckler, S. W., Burger, D., and Alvisi, L. 2002. Modeling the effect of technology trends on the soft error rate of combinational logic. In Proceedings of the International Conference on Dependable Systems and Networks (DSN). 389--398."},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of International Conference on Computer Design (ICCD). 481--488","author":"Shivakumar P.","unstructured":"Shivakumar , P. , Keckler , S. W. , Moore , C. R. , and Burger , D . 2003. Exploiting microarchitectural redundancy for defect tolerance . In Proceedings of International Conference on Computer Design (ICCD). 481--488 . Shivakumar, P., Keckler, S. W., Moore, C. R., and Burger, D. 2003. Exploiting microarchitectural redundancy for defect tolerance. In Proceedings of International Conference on Computer Design (ICCD). 481--488."},{"key":"e_1_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Siewiorek D. P. and Swarz R. S. 1998. Reliable Computer Systems: Design and Evaluation 3rd ed. AK Peters Ltd Wellesly MA USA.   Siewiorek D. P. and Swarz R. S. 1998. Reliable Computer Systems: Design and Evaluation 3rd ed. AK Peters Ltd Wellesly MA USA.","DOI":"10.1201\/9781439863961"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1024393.1024420"},{"key":"e_1_2_1_32_1","volume-title":"Proc. of International Symposium on Fault-Tolerant Computing (FTCS). 432--440","author":"Spainhower L.","unstructured":"Spainhower , L. and Gregg , T. A . 1998. G4: A fault-tolerant CMOS mainframe . In Proc. of International Symposium on Fault-Tolerant Computing (FTCS). 432--440 . Spainhower, L. and Gregg, T. A. 1998. G4: A fault-tolerant CMOS mainframe. In Proc. of International Symposium on Fault-Tolerant Computing (FTCS). 432--440."},{"key":"e_1_2_1_33_1","volume-title":"Proc. of International Conference on Dependable Systems and Networks (DSN). 177","author":"Srinivasan J.","unstructured":"Srinivasan , J. , Adve , S. V. , Bose , P. , and Rivers , J. A . 2004. The impact of technology scaling on lifetime reliability . In Proc. of International Conference on Dependable Systems and Networks (DSN). 177 . Srinivasan, J., Adve, S. V., Bose, P., and Rivers, J. A. 2004. The impact of technology scaling on lifetime reliability. In Proc. of International Conference on Dependable Systems and Networks (DSN). 177."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.462.0265"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/513918.514085"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of International Conference on Dependable Systems and Networks (DSN). 61","author":"Wang N. J.","unstructured":"Wang , N. J. , Quek , J. , Rafacz , T. M. , and Patel , S. J . 2004. Characterizing the effects of transient faults on a high-performance processor pipeline . In Proceedings of International Conference on Dependable Systems and Networks (DSN). 61 . Wang, N. J., Quek, J., Rafacz, T. M., and Patel, S. J. 2004. Characterizing the effects of transient faults on a high-performance processor pipeline. In Proceedings of International Conference on Dependable Systems and Networks (DSN). 61."},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the 2001 International Conference on Dependable Systems and Networks (DSN '01)","author":"Weaver C.","unstructured":"Weaver , C. and Austin , T . 2001. A fault tolerant approach to microprocessor design . In Proceedings of the 2001 International Conference on Dependable Systems and Networks (DSN '01) . IEEE, Washington, D.C. 411--420. Weaver, C. and Austin, T. 2001. A fault tolerant approach to microprocessor design. In Proceedings of the 2001 International Conference on Dependable Systems and Networks (DSN '01). IEEE, Washington, D.C. 411--420."},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the Annual International Symposium on Computer Architecture (ISCA). 264--275","author":"Weaver C.","unstructured":"Weaver , C. , Emer , J. S. , Mukherjee , S. S. , and Reinhardt , S. K . 2004. Techniques to reduce the soft error rate of a high-performance microprocessor . In Proceedings of the Annual International Symposium on Computer Architecture (ISCA). 264--275 . Weaver, C., Emer, J. S., Mukherjee, S. S., and Reinhardt, S. K. 2004. Techniques to reduce the soft error rate of a high-performance microprocessor. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA). 264--275."},{"key":"e_1_2_1_39_1","volume-title":"et al","author":"Wu E.","year":"2002","unstructured":"Wu , E. et al . 2002 . Interplay of voltage and temperature acceleration of oxide breakdown for ultra-thin gate dioxides. Solid State Electronics Journal . Wu, E. et al. 2002. Interplay of voltage and temperature acceleration of oxide breakdown for ultra-thin gate dioxides. Solid State Electronics Journal."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.401.0019"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.401.0003"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1216544.1216545","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T20:52:14Z","timestamp":1672260734000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1216544.1216545"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,3]]},"references-count":40,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2007,3]]}},"alternative-id":["10.1145\/1216544.1216545"],"URL":"https:\/\/doi.org\/10.1145\/1216544.1216545","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,3]]},"assertion":[{"value":"2007-03-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}