{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:39:18Z","timestamp":1750307958674,"version":"3.41.0"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2007,9,1]],"date-time":"2007-09-01T00:00:00Z","timestamp":1188604800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2007,9]]},"abstract":"<jats:p>The need to reduce power and complexity will increase the interest in Switch On Event multithreading (coarse-grained multithreading). Switch On Event multithreading is a low-power and low-complexity mechanism to improve processor throughput by switching threads on execution stalls. Fairness may, however, become a problem in a multithreaded processor. Unless fairness is properly handled, some threads may starve while others consume all of the processor cycles. Heuristics that were devised in order to improve fairness in simultaneous multithreading are not applicable to Switch On Event multithreading. This paper defines the fairness metric using the ratio of the individual threads' speedups and shows how it can be enforced in Switch On Event multithreading. Fairness is controlled by forcing additional thread switch points. These switch points are determined dynamically by runtime estimation of the single threaded performance of each of the individual threads. We analyze the impact of the fairness enforcement mechanism on aggregate IPC and weighted speedup. We present simulation results of the performance of Switch On Event multithreading. Switch On Event multithreading achieves an average aggregate IPC increase of 26% over single thread and 12% weighted speedup when no fairness is enforced. In this case, a sixth of our runs resulted in poor fairness in which one thread ran extremely slowly (10 to 100 times slower than its single-thread performance), while the other thread's performance was hardly affected. By using the proposed mechanism, we can guarantee fairness at different levels of strictness and, in most cases, even improve the weighted speedup.<\/jats:p>","DOI":"10.1145\/1275937.1275939","type":"journal-article","created":{"date-parts":[[2007,9,14]],"date-time":"2007-09-14T13:44:55Z","timestamp":1189777495000},"page":"15","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":32,"title":["Fairness enforcement in switch on event multithreading"],"prefix":"10.1145","volume":"4","author":[{"given":"Ron","family":"Gabor","sequence":"first","affiliation":[{"name":"Tel Aviv University; Intel Corporation, Tel Aviv"}]},{"given":"Shlomo","family":"Weiss","sequence":"additional","affiliation":[{"name":"Tel Aviv University, Tel Aviv"}]},{"given":"Avi","family":"Mendelson","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]}],"member":"320","published-online":{"date-parts":[[2007,9]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/325096.325119"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/223982.223985"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2004.10002"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2005.51"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2005.37"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.446.0885"},{"key":"e_1_2_1_7_1","volume-title":"Proc. of the 2000 IEEE International Conference on Computer Design (ICCD '00)","author":"Cai G.","year":"2000","unstructured":"Cai , G. 2000 . Power-sensitive multithreaded architecture . In Proc. of the 2000 IEEE International Conference on Computer Design (ICCD '00) . 199. Cai, G. 2000. Power-sensitive multithreaded architecture. In Proc. of the 2000 IEEE International Conference on Computer Design (ICCD '00). 199."},{"volume-title":"Intl. EuroPar Conference. 535--540","author":"Cazorla F. J.","key":"e_1_2_1_8_1","unstructured":"Cazorla , F. J. , Knijnenburg , P. M. , Sakellariou , R. , Fernandez , E. , Ramirez , A. , and Valero , M . 2004a. Feasibility of QoS for SMT by resource allocation . In Intl. EuroPar Conference. 535--540 . Cazorla, F. J., Knijnenburg, P. M., Sakellariou, R., Fernandez, E., Ramirez, A., and Valero, M. 2004a. Feasibility of QoS for SMT by resource allocation. In Intl. EuroPar Conference. 535--540."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2004.17"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/605432.605427"},{"volume-title":"Spec CPU2000","key":"e_1_2_1_11_1","unstructured":"CPU2000. Standard performance evaluation corporation , Spec CPU2000 . CPU2000. Standard performance evaluation corporation, Spec CPU2000."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2005.42"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/319151.319169"},{"volume-title":"Proc. of the International Symposium on High Performance Computing (ISHPC '97)","author":"Eickemeyer R. J.","key":"e_1_2_1_14_1","unstructured":"Eickemeyer , R. J. , Johnson , R. E. , Kunkel , S. R. , Lim , B.-H. , Squillante , M. S. , and Wu , C . -F. E. 1997. Evaluation of multithreaded processors and thread-switch policies . In Proc. of the International Symposium on High Performance Computing (ISHPC '97) . 75--90. Eickemeyer, R. J., Johnson, R. E., Kunkel, S. R., Lim, B.-H., Squillante, M. S., and Wu, C.-F. E. 1997. Evaluation of multithreaded processors and thread-switch policies. In Proc. of the International Symposium on High Performance Computing (ISHPC '97). 75--90."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1028176.1006722"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/115952.115988"},{"key":"e_1_2_1_17_1","unstructured":"Glaskowsky P. N. 2003. IBM previews Power5. Microprocessor Report.  Glaskowsky P. N. 2003. IBM previews Power5. Microprocessor Report."},{"key":"e_1_2_1_18_1","doi-asserted-by":"crossref","unstructured":"Gochman S. Mendelson A. Naveh A. and Rotem E. 2006. Introduction to Intel Core Duo processor architecture. Intel Technology Journal.  Gochman S. Mendelson A. Naveh A. and Rotem E. 2006. Introduction to Intel Core Duo processor architecture. Intel Technology Journal.","DOI":"10.1535\/itj.1002.01"},{"volume-title":"Proc. of the 1997 Advances in Parallel and Distributed Computing Conference (APDC '97)","author":"Gruenewald W.","key":"e_1_2_1_19_1","unstructured":"Gruenewald , W. and Ungerer , T . 1997. A multithreaded processor designed for distributed shared memory systems . In Proc. of the 1997 Advances in Parallel and Distributed Computing Conference (APDC '97) . 206. Gruenewald, W. and Ungerer, T. 1997. A multithreaded processor designed for distributed shared memory systems. In Proc. of the 1997 Advances in Parallel and Distributed Computing Conference (APDC '97). 206."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/115953.115978"},{"key":"e_1_2_1_21_1","first-page":"2","article-title":"Intel's P6 uses decoupled superscalar design","volume":"9","author":"Gwennap L.","year":"1995","unstructured":"Gwennap , L. 1995 . Intel's P6 uses decoupled superscalar design . Microprocessor Report 9 , 2 (Feb.). Gwennap, L. 1995. Intel's P6 uses decoupled superscalar design. Microprocessor Report 9, 2 (Feb.).","journal-title":"Microprocessor Report"},{"volume-title":"Proc. of the 20th IEEE International Performance, Computing, and Communications Conference. 319--328","author":"Haskins J. W.","key":"e_1_2_1_22_1","unstructured":"Haskins , J. W. and Skadron , J . 2001. Inexpensive throughput enhancement in small-scale embedded microprocessors with block multithreading: Extensions characterization, and tradeoffs . In Proc. of the 20th IEEE International Performance, Computing, and Communications Conference. 319--328 . Haskins, J. W. and Skadron, J. 2001. Inexpensive throughput enhancement in small-scale embedded microprocessors with block multithreading: Extensions characterization, and tradeoffs. In Proc. of the 20th IEEE International Performance, Computing, and Communications Conference. 319--328."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/269005.266689"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2005.33"},{"volume-title":"Proc. of MICRO-37","author":"Kalla R.","key":"e_1_2_1_25_1","unstructured":"Kalla , R. , Sinharoy , B. , and Tendler , J . 2004. IBM Power5 chip: A dual-core multithreaded processor . In Proc. of MICRO-37 . 40--47. Kalla, R., Sinharoy, B., and Tendler, J. 2004. IBM Power5 chip: A dual-core multithreaded processor. In Proc. of MICRO-37. 40--47."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/1025127.1026001"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2005.35"},{"key":"e_1_2_1_28_1","unstructured":"Krewell K. 2004. AMD vs. Intel in dual-core duel. Microprocessor Report.  Krewell K. 2004. AMD vs. Intel in dual-core duel. Microprocessor Report."},{"key":"e_1_2_1_29_1","unstructured":"Krewell K. 2006. Intel looks to core for success. Microprocessor Report.  Krewell K. 2006. Intel looks to core for success. Microprocessor Report."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2005.379"},{"volume-title":"Proc. of the International Symposium on Performance Analysis of Systems and Software. 164--171","author":"Luo K.","key":"e_1_2_1_31_1","unstructured":"Luo , K. , Gummaraju , J. , and Franklin , M . 2001. Balancing throughput and fairness in SMT processors . In Proc. of the International Symposium on Performance Analysis of Systems and Software. 164--171 . Luo, K., Gummaraju, J., and Franklin, M. 2001. Balancing throughput and fairness in SMT processors. In Proc. of the International Symposium on Performance Analysis of Systems and Software. 164--171."},{"key":"e_1_2_1_32_1","unstructured":"Marr D. Binns F. Hill D. Hinton G. Koufaty D. Miller J. and Upton M. 2002. Hyper-threading technology architecture and microarchitecture. Intel Technology Journal 6.  Marr D. Binns F. Hill D. Hinton G. Koufaty D. Miller J. and Upton M. 2002. Hyper-threading technology architecture and microarchitecture. Intel Technology Journal 6."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2005.34"},{"volume-title":"Proc. of the 6th International Symposium on High-Performance Computer Architecture (HPCA '00)","author":"Mowry T.","key":"e_1_2_1_34_1","unstructured":"Mowry , T. and Ramkissoon , S . 2000. Software-controlled multithreading using informing memory operations . In Proc. of the 6th International Symposium on High-Performance Computer Architecture (HPCA '00) . 121--132. Mowry, T. and Ramkissoon, S. 2000. Software-controlled multithreading using informing memory operations. In Proc. of the 6th International Symposium on High-Performance Computer Architecture (HPCA '00). 121--132."},{"volume-title":"Proc. of HPCA-9. 129","author":"Mutlu O.","key":"e_1_2_1_35_1","unstructured":"Mutlu , O. , Stark , J. , Wilkerson , C. , and Patt , Y. N . 2003. Runahead execution: An alternative to very large instruction windows for out-of-order processors . In Proc. of HPCA-9. 129 . Mutlu, O., Stark, J., Wilkerson, C., and Patt, Y. N. 2003. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In Proc. of HPCA-9. 129."},{"volume-title":"Proc. of the Workshop on Multithreaded Execution And Compilation.","author":"Raasch S. E.","key":"e_1_2_1_36_1","unstructured":"Raasch , S. E. and Reinhardt , S. K . 1999. Applications of thread prioritization in SMT processors . In Proc. of the Workshop on Multithreaded Execution And Compilation. Raasch, S. E. and Reinhardt, S. K. 1999. Applications of thread prioritization in SMT processors. In Proc. of the Workshop on Multithreaded Execution And Compilation."},{"volume-title":"Proc. of PACT-12","author":"Raasch S. E.","key":"e_1_2_1_37_1","unstructured":"Raasch , S. E. and Reinhardt , S. K . 2003. The impact of resource partitioning on SMT processors . In Proc. of PACT-12 . 15. Raasch, S. E. and Reinhardt, S. K. 2003. The impact of resource partitioning on SMT processors. In Proc. of PACT-12. 15."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2005.31"},{"volume-title":"Performance Analysis of Systems and Software, 2001. ISPASS. 2001 IEEE International Symposium on, 180--183","author":"Sazeides Y.","key":"e_1_2_1_39_1","unstructured":"Sazeides , Y. and Juan , T . 2001. How to compare the performance of two SMT microarchitectures . In Performance Analysis of Systems and Software, 2001. ISPASS. 2001 IEEE International Symposium on, 180--183 . Sazeides, Y. and Juan, T. 2001. How to compare the performance of two SMT microarchitectures. In Performance Analysis of Systems and Software, 2001. ISPASS. 2001 IEEE International Symposium on, 180--183."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/90.502236"},{"key":"e_1_2_1_41_1","unstructured":"Singhal R. Venkatraman K. Cohn E. Holm J. Koufaty D. Lin M.-J. Madhav M. Mattwandel M. Nidhi N. Pearce J. and Seshadri M. 2004. Performance analysis and validation of the Intel Pentium4 processor on 90nm technology. Intel Technology Journal 8.  Singhal R. Venkatraman K. Cohn E. Holm J. Koufaty D. Lin M.-J. Madhav M. Mattwandel M. Nidhi N. Pearce J. and Seshadri M. 2004. Performance analysis and validation of the Intel Pentium4 processor on 90nm technology. Intel Technology Journal 8."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/356989.357011"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2005.10"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCASIA.2005.22"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.461.0005"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/195470.195583"},{"volume-title":"Proc. of MICRO-34","author":"Tullsen D. M.","key":"e_1_2_1_47_1","unstructured":"Tullsen , D. M. and Brown , J . 2001. Handling long-latency loads in a simultaneous multithreading processor . In Proc. of MICRO-34 . 318--327. Tullsen, D. M. and Brown, J. 2001. Handling long-latency loads in a simultaneous multithreading processor. In Proc. of MICRO-34. 318--327."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/232973.232993"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/285930.286011"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2004.8"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/641865.641867"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1275937.1275939","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1275937.1275939","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T14:58:00Z","timestamp":1750258680000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1275937.1275939"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,9]]},"references-count":51,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2007,9]]}},"alternative-id":["10.1145\/1275937.1275939"],"URL":"https:\/\/doi.org\/10.1145\/1275937.1275939","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2007,9]]},"assertion":[{"value":"2007-09-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}