{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:35:16Z","timestamp":1750307716141,"version":"3.41.0"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2009,6,1]],"date-time":"2009-06-01T00:00:00Z","timestamp":1243814400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2009,6]]},"abstract":"<jats:p>\n            Multithreading is widely used to increase processor throughput. As the number of shared resources increase, managing them while guaranteeing predicted performance becomes a major problem. Attempts have been made in previous work to ease this via different fairness mechanisms. In this article, we present a new approach to control the resource allocation and sharing via a service level agreement (SLA)-based mechanism; that is, via an agreement in which multithreaded processors guarantee a minimal level of service to the running threads. We introduce a new metric,\n            <jats:italic>C<\/jats:italic>\n            <jats:sub>SLA<\/jats:sub>\n            , for conformance to SLA in multithreaded processors and show that controlling resources using with SLA allows for higher gains than are achievable by previously suggested fairness techniques. It also permits improving one metric (e.g., power) while maintaining SLA in another (e.g., performance). We compare SLA enforcement to schemes based on other fairness metrics, which are mostly targeted at equalizing execution parameters. We show that using SLA rather than fairness based algorithms provides a range of acceptable execution points from which we can select the point that best fits our optimization target, such as maximizing the weighted speedup (sum of the speedups of the individual threads) or reducing power. We demonstrate the effectiveness of the new SLA approach using switch-on-event (coarse-grained) multithreading. Our weighted speedup improvement scheme successfully enforces SLA while improving the weighted speedup by an average of 10% for unbalanced threads. This result is significant when compared with performance losses that may be incurred by fairness enforcement methods. When optimizing for power reduction in unbalanced threads SLA enforcement reduces the power by an average of 15%. SLA may be complemented by other power reduction methods to achieve further power savings\n            <jats:italic>and<\/jats:italic>\n            maintain the same service level for the threads. We also demonstrate differentiated SLA, where weighted speedup is maximized while each thread may have a different throughput constraint.\n          <\/jats:p>","DOI":"10.1145\/1543753.1543755","type":"journal-article","created":{"date-parts":[[2009,7,8]],"date-time":"2009-07-08T17:28:33Z","timestamp":1247074113000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Service level agreement for multithreaded processors"],"prefix":"10.1145","volume":"6","author":[{"given":"Ron","family":"Gabor","sequence":"first","affiliation":[{"name":"Tel Aviv University and Intel Corporation"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Avi","family":"Mendelson","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shlomo","family":"Weiss","sequence":"additional","affiliation":[{"name":"Tel Aviv University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2009,7,6]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2004.10002"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0001867800013185"},{"volume-title":"Proceedings of the 15th Annual Joint Conference of the IEEE Computer Societies. IEEE, 1.","author":"Bennett J.","key":"e_1_2_1_3_1","unstructured":"Bennett , J. and Zhang , H . 1996. WF2Q: worst-case fair weighted fair queueing . In Proceedings of the 15th Annual Joint Conference of the IEEE Computer Societies. IEEE, 1. Bennett, J. and Zhang, H. 1996. WF2Q: worst-case fair weighted fair queueing. In Proceedings of the 15th Annual Joint Conference of the IEEE Computer Societies. IEEE, 1."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.446.0885"},{"volume-title":"Proceedings of the 7th Internation Symposium on High-Performance Computer Architecture (HPCA-7). IEEE, 171--182","author":"Brooks D.","key":"e_1_2_1_5_1","unstructured":"Brooks , D. and Martonosi , M . 2001. Dynamic thermal management for high-performance microprocessors . In Proceedings of the 7th Internation Symposium on High-Performance Computer Architecture (HPCA-7). IEEE, 171--182 . Brooks, D. and Martonosi, M. 2001. Dynamic thermal management for high-performance microprocessors. In Proceedings of the 7th Internation Symposium on High-Performance Computer Architecture (HPCA-7). IEEE, 171--182."},{"volume-title":"Proceedings of the 5th International Symposium on High-Performance Computing. Springer.","author":"Cazorla F.","key":"e_1_2_1_6_1","unstructured":"Cazorla , F. , Fernandez , E. , Ramirez , A. , and Valero , M . 2003. Improving memory latency aware fetch policies for SMT processors . In Proceedings of the 5th International Symposium on High-Performance Computing. Springer. Cazorla, F., Fernandez, E., Ramirez, A., and Valero, M. 2003. Improving memory latency aware fetch policies for SMT processors. In Proceedings of the 5th International Symposium on High-Performance Computing. Springer."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2006.108"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2004.37"},{"volume-title":"Proceedings of the International Euro-Par Conference. Springer, 535--540","author":"Cazorla F. J.","key":"e_1_2_1_9_1","unstructured":"Cazorla , F. J. , Knijnenburg , P. M. , Sakellariou , R. , Fernandez , E. , Ramirez , A. , and Valero , M . 2004. Feasibility of QoS for SMT by resource allocation . In Proceedings of the International Euro-Par Conference. Springer, 535--540 . Cazorla, F. J., Knijnenburg, P. M., Sakellariou, R., Fernandez, E., Ramirez, A., and Valero, M. 2004. Feasibility of QoS for SMT by resource allocation. In Proceedings of the International Euro-Par Conference. Springer, 535--540."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2004.17"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1274971.1275005"},{"key":"e_1_2_1_12_1","volume-title":"FROCM: A fair and low-overhead method in SMT processor. In Proceedings of the 3rd International Conference on High-Performance Computing and Communications","author":"Chen S.","year":"2007","unstructured":"Chen , S. and Ma , P . 2007 . FROCM: A fair and low-overhead method in SMT processor. In Proceedings of the 3rd International Conference on High-Performance Computing and Communications . Springer . Chen, S. and Ma, P. 2007. FROCM: A fair and low-overhead method in SMT processor. In Proceedings of the 3rd International Conference on High-Performance Computing and Communications. Springer."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/635508.605427"},{"volume-title":"Spec CPU2000","key":"e_1_2_1_14_1","unstructured":"CPU2000. Standard Performance Evaluation Corporation , Spec CPU2000 . CPU2000. Standard Performance Evaluation Corporation, Spec CPU2000."},{"volume-title":"Proceedings of the International Symposium on High-Performance Computing (ISHPC '97)","author":"Eickemeyer R. J.","key":"e_1_2_1_15_1","unstructured":"Eickemeyer , R. J. , Johnson , R. E. , Kunkel , S. R. , Lim , B.-H. , Squillante , M. S. , and Wu , C . -F. E. 1997. Evaluation of multithreaded processors and thread-switch policies . In Proceedings of the International Symposium on High-Performance Computing (ISHPC '97) . Springer, 75--90. Eickemeyer, R. J., Johnson, R. E., Kunkel, S. R., Lim, B.-H., Squillante, M. S., and Wu, C.-F. E. 1997. Evaluation of multithreaded processors and thread-switch policies. In Proceedings of the International Symposium on High-Performance Computing (ISHPC '97). Springer, 75--90."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1028176.1006722"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/115952.115988"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.25"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1275937.1275939"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFCOM.1994.337677"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/s005300050052"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2007.6"},{"key":"e_1_2_1_23_1","first-page":"2","article-title":"Intel's P6 uses decoupled superscalar design","volume":"9","author":"Gwennap L.","year":"1995","unstructured":"Gwennap , L. 1995 . Intel's P6 uses decoupled superscalar design . Micro-processor Rep. 9 , 2 . Gwennap, L. 1995. Intel's P6 uses decoupled superscalar design. Micro-processor Rep. 9, 2.","journal-title":"Micro-processor Rep."},{"key":"e_1_2_1_24_1","unstructured":"Halfhill T. R. 2006. Intel goes quad. Micro-processor Rep.  Halfhill T. R. 2006. Intel goes quad. Micro-processor Rep."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1152154.1152161"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1006209.1006246"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1254882.1254886"},{"key":"e_1_2_1_28_1","unstructured":"Jain R. Chiu D. and Hawe W. 1998. A quantitative measure of fairness and discrimination for resource allocation in shared computer systems. Arxiv preprint cs.NI\/9809099. http:\/\/adsabs.harvard.edu\/abs\/1998cs....9099J  Jain R. Chiu D. and Hawe W. 1998. A quantitative measure of fairness and discrimination for resource allocation in shared computer systems. Arxiv preprint cs.NI\/9809099. http:\/\/adsabs.harvard.edu\/abs\/1998cs....9099J"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/1025127.1026001"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1241601.1241609"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2005.35"},{"key":"e_1_2_1_32_1","unstructured":"Krewell K. 2006. Intel looks to core for success. Micro-processor Rep.  Krewell K. 2006. Intel looks to core for success. Micro-processor Rep."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/RTAS.2006.29"},{"volume-title":"Proceedings of the International Symposium on Performance Analysis of Systems and Software. IEEE, 164--171","author":"Luo K.","key":"e_1_2_1_34_1","unstructured":"Luo , K. , Gummaraju , J. , and Franklin , M . 2001. Balancing throughput and fairness in SMT processors . In Proceedings of the International Symposium on Performance Analysis of Systems and Software. IEEE, 164--171 . Luo, K., Gummaraju, J., and Franklin, M. 2001. Balancing throughput and fairness in SMT processors. In Proceedings of the International Symposium on Performance Analysis of Systems and Software. IEEE, 164--171."},{"key":"e_1_2_1_35_1","unstructured":"McGregor J. 2007. The New x86 Landscape. Micro-processor rep.  McGregor J. 2007. The New x86 Landscape. Micro-processor rep."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2005.34"},{"volume-title":"Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA-9). IEEE, 129","author":"Mutlu O.","key":"e_1_2_1_37_1","unstructured":"Mutlu , O. , Stark , J. , Wilkerson , C. , and Patt , Y. N . 2003. Runahead execution: an alternative to very large instruction windows for out-of-order processors . In Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA-9). IEEE, 129 . Mutlu, O., Stark, J., Wilkerson, C., and Patt, Y. N. 2003. Runahead execution: an alternative to very large instruction windows for out-of-order processors. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA-9). IEEE, 129."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.24"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250662.1250671"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2007.346188"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/280756.280790"},{"volume-title":"Proceedings of the Workshop on Multi-threaded Execution and Compilation. ACM.","author":"Raasch S. E.","key":"e_1_2_1_42_1","unstructured":"Raasch , S. E. and Reinhardt , S. K . 1999. Applications of thread prioritization in SMT processors . In Proceedings of the Workshop on Multi-threaded Execution and Compilation. ACM. Raasch, S. E. and Reinhardt, S. K. 1999. Applications of thread prioritization in SMT processors. In Proceedings of the Workshop on Multi-threaded Execution and Compilation. ACM."},{"volume-title":"Proceedings of the 12th International Conference on Parallel Architecture and Compilation Techniques (PACT-12)","author":"Raasch S. E.","key":"e_1_2_1_43_1","unstructured":"Raasch , S. E. and Reinhardt , S. K . 2003. The impact of resource partitioning on SMT processors . In Proceedings of the 12th International Conference on Parallel Architecture and Compilation Techniques (PACT-12) . ACM, 15--25. Raasch, S. E. and Reinhardt, S. K. 2003. The impact of resource partitioning on SMT processors. In Proceedings of the 12th International Conference on Parallel Architecture and Compilation Techniques (PACT-12). ACM, 15--25."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/1152154.1152160"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1012888.1005704"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2005.176"},{"key":"e_1_2_1_47_1","unstructured":"Singhal R. Venkatraman K. Cohn E. Holm J. Koufaty D. Lin M.-J. Madhav M. Mattwandel M. Nidhi N. Pearce J. and Seshadri M. 2004. Performance analysis and validation of the Intel Pentium4 processor on 90nm technology. Intel Tech. J. 8.  Singhal R. Venkatraman K. Cohn E. Holm J. Koufaty D. Lin M.-J. Madhav M. Mattwandel M. Nidhi N. Pearce J. and Seshadri M. 2004. Performance analysis and validation of the Intel Pentium4 processor on 90nm technology. Intel Tech. J. 8."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/356989.357011"},{"key":"e_1_2_1_49_1","unstructured":"Tarjan D. Thoziyoor S. and Jouppi N. 2006. CACTI 4.0. Tech. rep. HP Laboratories.  Tarjan D. Thoziyoor S. and Jouppi N. 2006. CACTI 4.0. Tech. rep. HP Laboratories."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/195470.195583"},{"volume-title":"Proceedings of the 12th International Conference on Parallel Architecture and Compilation Techniques (PACT-12)","author":"Tuck N.","key":"e_1_2_1_51_1","unstructured":"Tuck , N. and Tullsen , D. M . 2003. Initial observations of the simultaneous multi-threading Pentium 4 processor . In Proceedings of the 12th International Conference on Parallel Architecture and Compilation Techniques (PACT-12) . ACM, 26. Tuck, N. and Tullsen, D. M. 2003. Initial observations of the simultaneous multi-threading Pentium 4 processor. In Proceedings of the 12th International Conference on Parallel Architecture and Compilation Techniques (PACT-12). ACM, 26."},{"volume-title":"Proceedings of the 34th Annual IEEE\/ACM International Symposium on Micro-architecture (MICRO-34)","author":"Tullsen D. M.","key":"e_1_2_1_52_1","unstructured":"Tullsen , D. M. and Brown , J . 2001. Handling long-latency loads in a simultaneous multi- threading processor . In Proceedings of the 34th Annual IEEE\/ACM International Symposium on Micro-architecture (MICRO-34) . ACM, 318--327. Tullsen, D. M. and Brown, J. 2001. Handling long-latency loads in a simultaneous multi- threading processor. In Proceedings of the 34th Annual IEEE\/ACM International Symposium on Micro-architecture (MICRO-34). ACM, 318--327."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1985.1676564"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/885651.781057"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/1086297.1086328"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1543753.1543755","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1543753.1543755","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:30:20Z","timestamp":1750253420000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1543753.1543755"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,6]]},"references-count":55,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2009,6]]}},"alternative-id":["10.1145\/1543753.1543755"],"URL":"https:\/\/doi.org\/10.1145\/1543753.1543755","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2009,6]]},"assertion":[{"value":"2008-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-07-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}