{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T12:17:29Z","timestamp":1763468249760,"version":"3.41.0"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2015,1,9]],"date-time":"2015-01-09T00:00:00Z","timestamp":1420761600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Air Force Office of Scientific Research under AFOSR Award No. FA9550-12-1-0476"},{"name":"DoD High Performance Computing Modernization Program at the AFRL, ARL and ERDC DoD Supercomputing Resource Centers"},{"name":"HPCMP's PETTT program (Contract No: GS04T09DBC0017 though DRC)"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2015,1,9]]},"abstract":"<jats:p>This work presents an end-to-end methodology for quantifying the performance and power benefits of simultaneous multithreading (SMT) for HPC centers and applies this methodology to a production system and workload. Ultimately, SMT\u2019s value system-wide depends on whether users effectively employ SMT at the application level. However, predicting SMT\u2019s benefit for HPC applications is challenging; by doubling the number of threads, the application\u2019s characteristics may change. This work proposes statistical modeling techniques to predict the speedup SMT confers to HPC applications. This approach, accurate to within 8%, uses only lightweight, transparent performance monitors collected during a single run of the application.<\/jats:p>","DOI":"10.1145\/2687651","type":"journal-article","created":{"date-parts":[[2015,1,12]],"date-time":"2015-01-12T20:02:10Z","timestamp":1421092930000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Making the Most of SMT in HPC"],"prefix":"10.1145","volume":"11","author":[{"given":"Leo","family":"Porter","sequence":"first","affiliation":[{"name":"EP Analytics and the University of California, San Diego"}]},{"given":"Michael A.","family":"Laurenzano","sequence":"additional","affiliation":[{"name":"EP Analytics and the University of Michigan, Ann Arbor"}]},{"given":"Ananta","family":"Tiwari","sequence":"additional","affiliation":[{"name":"EP Analytics and the San Diego Supercomputer Center"}]},{"given":"Adam","family":"Jundt","sequence":"additional","affiliation":[{"name":"EP Analytics"}]},{"given":"William A.","family":"Ward, Jr.","sequence":"additional","affiliation":[{"name":"Department of Defense HPC Modernization Program"}]},{"given":"Roy","family":"Campbell","sequence":"additional","affiliation":[{"name":"Department of Defense HPC Modernization Program"}]},{"given":"Laura","family":"Carrington","sequence":"additional","affiliation":[{"name":"EP Analytics and the San Diego Supercomputer Center"}]}],"member":"320","published-online":{"date-parts":[[2015,1,9]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Lenstra","author":"Aarts Emile H. L.","year":"1997","unstructured":"Emile H. L. Aarts and Jan K . Lenstra . 1997 . Local Search in Combinatorial Optimization. Princeton University Press . Emile H. L. Aarts and Jan K. Lenstra. 1997. Local Search in Combinatorial Optimization. Princeton University Press."},{"volume-title":"NERSC-6 Workload Analysis and Benchmark Selection Process","author":"Antypas Katie","key":"e_1_2_1_2_1","unstructured":"Katie Antypas , John Shalf , and Harvey Wasserman . 2008. NERSC-6 Workload Analysis and Benchmark Selection Process . Technical Report, Lawrence Berkeley National Laboratory . Katie Antypas, John Shalf, and Harvey Wasserman. 2008. NERSC-6 Workload Analysis and Benchmark Selection Process. Technical Report, Lawrence Berkeley National Laboratory."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/125826.125925"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1375527.1375580"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454128"},{"key":"e_1_2_1_6_1","article-title":"The OpenCV library","volume":"25","author":"Bradski Gary","year":"2000","unstructured":"Gary Bradski . 2000 . The OpenCV library . Doctor Dobbs Journal 25 , 11 (2000). Gary Bradski. 2000. The OpenCV library. Doctor Dobbs Journal 25, 11 (2000).","journal-title":"Doctor Dobbs Journal"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2005.33"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626413400082"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2004.17"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2004.1303311"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jmarsys.2005.09.016"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2006.25"},{"volume-title":"Proceedings of the 4th International Workshop on Performance Modeling, Benchmarking and Simulation.","author":"Cordery M. J.","key":"e_1_2_1_13_1","unstructured":"M. J. Cordery , B. Austin , H. J. Wassermann , C. S. Daley , N. J. Wright , S. D. Hammond , and D. Doerfler . 2013. Analysis of Cray XC30 Performance Using Trinity-NERSC-8 Benchmarks and Comparison with Cray XE6 and IBM BG\/Q . In Proceedings of the 4th International Workshop on Performance Modeling, Benchmarking and Simulation. M. J. Cordery, B. Austin, H. J. Wassermann, C. S. Daley, N. J. Wright, S. D. Hammond, and D. Doerfler. 2013. Analysis of Cray XC30 Performance Using Trinity-NERSC-8 Benchmarks and Comparison with Cray XE6 and IBM BG\/Q. In Proceedings of the 4th International Workshop on Performance Modeling, Benchmarking and Simulation."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1183401.1183426"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/QEST.2005.16"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2007.115"},{"volume-title":"Proceedings of the 20th International Conference on Parallel and Distributed Processing.","author":"DeVuyst Matthew","key":"e_1_2_1_17_1","unstructured":"Matthew DeVuyst , Rakesh Kumar , and Dean M. Tullsen . 2006. Exploiting unbalanced thread scheduling for energy and performance on a cmp of smt processors . In Proceedings of the 20th International Conference on Parallel and Distributed Processing. Matthew DeVuyst, Rakesh Kumar, and Dean M. Tullsen. 2006. Exploiting unbalanced thread scheduling for energy and performance on a cmp of smt processors. In Proceedings of the 20th International Conference on Parallel and Distributed Processing."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015408"},{"key":"e_1_2_1_19_1","volume-title":"Applied Regression Analysis","author":"Draper Norman Richard","unstructured":"Norman Richard Draper and Harry Smith . 1981. Applied Regression Analysis ( 2 nd ed.). John Wiley and Sons . Norman Richard Draper and Harry Smith. 1981. Applied Regression Analysis (2nd ed.). John Wiley and Sons.","edition":"2"},{"key":"e_1_2_1_20_1","volume-title":"Hart","author":"Duda Peter E.","year":"1973","unstructured":"Peter E. Duda and Richard O . Hart . 1973 . Pattern Classification and Scene Analysis. John Wiley and Sons . Peter E. Duda and Richard O. Hart. 1973. Pattern Classification and Scene Analysis. John Wiley and Sons."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1508244.1508260"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2207222.2207223"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2541940.2541954"},{"key":"e_1_2_1_24_1","volume-title":"Evaluating new architectural features of the Intel\u00ae Xeon\u00ae 7500 processor for HPC workloads. Computer Science 12","author":"Gepner Pawe\u0142","year":"2011","unstructured":"Pawe\u0142 Gepner , David L. Fraser , Micha\u0142 F. Kowalik , and Kazimierz Wa\u0107kowski . 2011. Evaluating new architectural features of the Intel\u00ae Xeon\u00ae 7500 processor for HPC workloads. Computer Science 12 ( 2011 ). Pawe\u0142 Gepner, David L. Fraser, Micha\u0142 F. Kowalik, and Kazimierz Wa\u0107kowski. 2011. Evaluating new architectural features of the Intel\u00ae Xeon\u00ae 7500 processor for HPC workloads. Computer Science 12 (2011)."},{"volume-title":"Proceedings of the Workshop on Interaction between Operating System and Computer Architecture.","author":"Ryan","key":"e_1_2_1_25_1","unstructured":"Ryan E. Grant and Ahmad Afsahi. 2005. Characterization of multithreaded scientific workloads on simultaneous multithreading intel processors . In Proceedings of the Workshop on Interaction between Operating System and Computer Architecture. Ryan E. Grant and Ahmad Afsahi. 2005. Characterization of multithreaded scientific workloads on simultaneous multithreading intel processors. In Proceedings of the Workshop on Interaction between Operating System and Computer Architecture."},{"key":"e_1_2_1_26_1","volume-title":"Technical Report 0604, EPCC--University of Edinburgh.","author":"Gray Alan","year":"2006","unstructured":"Alan Gray , J. Hein , M. Plummer , A. Sunderland , L. Smith , A. Simpson , and A. Trew . 2006 . An Investigation of Simultaneous Multithreading on HPCx . Technical Report 0604, EPCC--University of Edinburgh. Alan Gray, J. Hein, M. Plummer, A. Sunderland, L. Smith, A. Simpson, and A. Trew. 2006. An Investigation of Simultaneous Multithreading on HPCx. Technical Report 0604, EPCC--University of Edinburgh."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.2514\/6.2004-5452"},{"volume-title":"Principal Component Analysis","author":"Jolliffe Ian","key":"e_1_2_1_28_1","unstructured":"Ian Jolliffe . 2005. Principal Component Analysis . Wiley Online Library . Ian Jolliffe. 2005. Principal Component Analysis. Wiley Online Library."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2010.38"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/582034.582071"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342005056114"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2003.1196115"},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Michael A. Laurenzano Mitesh Meswani Laura Carrington Allan Snavely Mustafa M. Tikir and Stephen Poole. 2011. Reducing energy usage with memory and computation-aware dynamic frequency scaling. In Euro-Par 2011 Parallel Processing.   Michael A. Laurenzano Mitesh Meswani Laura Carrington Allan Snavely Mustafa M. Tikir and Stephen Poole. 2011. Reducing energy usage with memory and computation-aware dynamic frequency scaling. In Euro-Par 2011 Parallel Processing.","DOI":"10.1007\/978-3-642-23400-2_9"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1229428.1229479"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/942791.943025"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1012888.1005691"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.5555\/1148882.1148888"},{"key":"e_1_2_1_38_1","unstructured":"Hans Meuer Erich Strohmaier Jack Dongarra and Horst Simon. 2014. The Top 500 List. Retrieved from http:\/\/www.top500.org.  Hans Meuer Erich Strohmaier Jack Dongarra and Horst Simon. 2014. The Top 500 List. Retrieved from http:\/\/www.top500.org."},{"key":"e_1_2_1_39_1","volume-title":"Boisseau","author":"Milfeld Kent F.","year":"2003","unstructured":"Kent F. Milfeld , Chona S. Guiang , Avijit Purkayastha , and John R . Boisseau . 2003 . Exploring the effects of hyper-threading on scientific applications. Cray User Group 2003 112 (2003). Kent F. Milfeld, Chona S. Guiang, Avijit Purkayastha, and John R. Boisseau. 2003. Exploring the effects of hyper-threading on scientific applications. Cray User Group 2003 112 (2003)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2005.74"},{"key":"e_1_2_1_41_1","unstructured":"Steve Plimpton Paul Crozier and Aidan Thompson. 2007. LAMMPS-large-scale Atomic\/Molecular Massively Parallel Simulator. Sandia National Laboratories.  Steve Plimpton Paul Crozier and Aidan Thompson. 2007. LAMMPS-large-scale Atomic\/Molecular Massively Parallel Simulator. Sandia National Laboratories."},{"key":"e_1_2_1_42_1","volume-title":"Polybench: The Polyhedral Benchmark Suite.","author":"Pouchet Louis-No\u00ebl","year":"2012","unstructured":"Louis-No\u00ebl Pouchet . 2012 . Polybench: The Polyhedral Benchmark Suite. Retrieved from http:\/\/www.cs.ucla. edu\/&sim;pouchet\/software\/polybench\/. Louis-No\u00ebl Pouchet. 2012. Polybench: The Polyhedral Benchmark Suite. Retrieved from http:\/\/www.cs.ucla. edu\/&sim;pouchet\/software\/polybench\/."},{"volume-title":"Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques.","author":"Steven","key":"e_1_2_1_43_1","unstructured":"Steven E. Raasch and Steven K. Reinhardt. 2003. The impact of resource partitioning on SMT processors . In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques. Steven E. Raasch and Steven K. Reinhardt. 2003. The impact of resource partitioning on SMT processors. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/1839667.1839671"},{"key":"e_1_2_1_45_1","unstructured":"RuleQuest Research. 2012. Data Mining with Cubist. Retrieved from http:\/\/rulequest.com\/cubist-info.html.  RuleQuest Research. 2012. Data Mining with Cubist. Retrieved from http:\/\/rulequest.com\/cubist-info.html."},{"key":"e_1_2_1_46_1","volume-title":"Performance Evaluation of the Intel Sandy Bridge Based NASA Pleiades Using Scientific and Engineering Applications. White paper","author":"Saini Subhash","year":"2013","unstructured":"Subhash Saini , Johnny Chang , and Haoqiang Jin . 2013. Performance Evaluation of the Intel Sandy Bridge Based NASA Pleiades Using Scientific and Engineering Applications. White paper , NASA Ames Research Center . ( 2013 ). Subhash Saini, Johnny Chang, and Haoqiang Jin. 2013. Performance Evaluation of the Intel Sandy Bridge Based NASA Pleiades Using Scientific and Engineering Applications. White paper, NASA Ames Research Center. (2013)."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2011.6152743"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2039252.2039262"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/HOTCHIPS.2008.7476555"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.5555\/762761.762785"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/378993.379244"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/511334.511343"},{"volume-title":"Watts up&quest","author":"ThinkTank Energy Products Inc. 2014.","key":"e_1_2_1_53_1","unstructured":"ThinkTank Energy Products Inc. 2014. Watts up&quest ; Product. Retrieved from http:\/\/www.wattsupmeters.com. ThinkTank Energy Products Inc. 2014. Watts up&quest; Product. Retrieved from http:\/\/www.wattsupmeters.com."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.5555\/838237.838732"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03869-3_16"},{"volume-title":"Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques.","author":"Tuck Nathan","key":"e_1_2_1_56_1","unstructured":"Nathan Tuck and Dean M. Tullsen . 2003. Initial observations of the simultaneous multithreading Pentium 4 processor . In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques. Nathan Tuck and Dean M. Tullsen. 2003. Initial observations of the simultaneous multithreading Pentium 4 processor. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques."},{"volume-title":"Proceedings of the 34th International Symposium on Microarchitecture.","author":"Dean","key":"e_1_2_1_57_1","unstructured":"Dean M. Tullsen and Jeffery A. Brown. 2001. Handling long-latency loads in a simultaneous multithreading processor . In Proceedings of the 34th International Symposium on Microarchitecture. Dean M. Tullsen and Jeffery A. Brown. 2001. Handling long-latency loads in a simultaneous multithreading processor. In Proceedings of the 34th International Symposium on Microarchitecture."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/232973.232993"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/225830.224449"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.5555\/2523721.2523746"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454148"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2687651","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2687651","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:12:14Z","timestamp":1750227134000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2687651"}},"subtitle":["System- and Application-Level Perspectives"],"short-title":[],"issued":{"date-parts":[[2015,1,9]]},"references-count":61,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,1,9]]}},"alternative-id":["10.1145\/2687651"],"URL":"https:\/\/doi.org\/10.1145\/2687651","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2015,1,9]]},"assertion":[{"value":"2014-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-01-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}