{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,9]],"date-time":"2026-03-09T01:41:17Z","timestamp":1773020477971,"version":"3.50.1"},"reference-count":173,"publisher":"SAGE Publications","issue":"6","license":[{"start":{"date-parts":[[2016,9,9]],"date-time":"2016-09-09T00:00:00Z","timestamp":1473379200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2017,11]]},"abstract":"<jats:p> Energy consumption is one of the top challenges for achieving the next generation of supercomputing. Codesign of hardware and software is critical for improving energy efficiency (EE) for future large-scale systems. Many architectural power-saving techniques have been developed, and most hardware components are approaching physical limits. Accordingly, parallel computing software, including both applications and systems, should exploit power-saving hardware innovations and manage efficient energy use. In addition, new power-aware parallel computing methods are essential to decrease energy usage further. This article surveys software-based methods that aim to improve EE for parallel computing. It reviews the methods that exploit the characteristics of parallel scientific applications, including load imbalance and mixed precision of floating-point (FP) calculations, to improve EE. In addition, this article summarizes widely used methods to improve power usage at different granularities, such as the whole system and per application. In particular, it describes the most important techniques to measure and to achieve energy-efficient usage of various parallel computing facilities, including processors, memories, and networks. Overall, this article reviews the state-of-the-art of energy-efficient methods for parallel computing to motivate researchers to achieve optimal parallel computing under a power budget constraint. <\/jats:p>","DOI":"10.1177\/1094342016665471","type":"journal-article","created":{"date-parts":[[2016,9,10]],"date-time":"2016-09-10T01:04:09Z","timestamp":1473469449000},"page":"517-549","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":36,"title":["A survey on software methods to improve the energy efficiency of parallel computing"],"prefix":"10.1177","volume":"31","author":[{"given":"Chao","family":"Jin","sequence":"first","affiliation":[{"name":"Research Computing Center, The University of Queensland, Brisbane, Queensland, Australia"},{"name":"Faculty of Information Technology, Monash University, Clayton, Victoria, Australia"}]},{"given":"Bronis R","family":"de Supinski","sequence":"additional","affiliation":[{"name":"Lawrence Livermore National Laboratory, Livermore, California, USA"}]},{"given":"David","family":"Abramson","sequence":"additional","affiliation":[{"name":"Research Computing Center, The University of Queensland, Brisbane, Queensland, Australia"},{"name":"School of Information Technology and Electrical Engineering, The University of Queensland"}]},{"given":"Heidi","family":"Poxon","sequence":"additional","affiliation":[{"name":"Cray Inc., Saint Paul, Minnesota, USA"}]},{"given":"Luiz","family":"DeRose","sequence":"additional","affiliation":[{"name":"Cray Inc., Saint Paul, Minnesota, USA"}]},{"given":"Minh Ngoc","family":"Dinh","sequence":"additional","affiliation":[{"name":"Research Computing Center, The University of Queensland, Brisbane, Queensland, Australia"}]},{"given":"Mark","family":"Endrei","sequence":"additional","affiliation":[{"name":"Research Computing Center, The University of Queensland, Brisbane, Queensland, Australia"},{"name":"School of Information Technology and Electrical Engineering, The University of Queensland"}]},{"given":"Elizabeth R","family":"Jessup","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Colorado Boulder, Boulder, Colorado, USA"}]}],"member":"179","published-online":{"date-parts":[[2016,9,9]]},"reference":[{"key":"bibr1-1094342016665471","volume-title":"Proceedings of the 2012 Workshop on Power-Aware Computing and Systems (HotPower\u201912)","author":"Abe Y","year":"2012"},{"key":"bibr2-1094342016665471","unstructured":"Alan I, Arslan E, Kosar T (2015) Energy-aware data transfer algorithms. In: Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201915), Austin, TX, 15\u201320 November 2015. New York, NY: ACM Press."},{"key":"bibr3-1094342016665471","doi-asserted-by":"crossref","unstructured":"Alonso M, Coll S, Martinez J-M, (2006) Dynamic power saving in fat-tree interconnection networks using on\/off link. In: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201906), Rhodes Island, Greece, 25\u201329 April 2006, pp. 299\u2013307. IEEE Press.","DOI":"10.1109\/IPDPS.2006.1639599"},{"key":"bibr4-1094342016665471","author":"Amarasinghe S","year":"2009","journal-title":"Exascale Software Study: Software Challenges in Extreme Scale Systems. DARPA Report"},{"key":"bibr5-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1007\/s00450-010-0124-2"},{"key":"bibr6-1094342016665471","volume-title":"The top ten exascale research challenges. ASCAC (Advanced Scientific Computing Advisory Committee) Subcommittee Report","author":"ASCAC Subcommittee","year":"2014"},{"key":"bibr7-1094342016665471","unstructured":"Avelar V, Azevedo D, French A (2014) PUE\u2122: a comprehensive examination of the metric. TheGreenGrid White Paper #49. ASHRAE Press."},{"key":"bibr8-1094342016665471","doi-asserted-by":"crossref","unstructured":"Baboulina M, Donfacka S, Dongarra J, (2012) A class of communication-avoiding algorithms for solving general dense linear systems on CPU\/GPU parallel machines. In: Proceedings of the International Conference on Computational Science, (ICCS 2012), Omaha, Nebraska, 4\u20136 June 2012, pp. 17\u201326. Elsevier Press.","DOI":"10.1016\/j.procs.2012.04.003"},{"key":"bibr9-1094342016665471","doi-asserted-by":"crossref","unstructured":"Baek W, Chilimbi TM (2010) Green: a framework for supporting energy-conscious programming using controlled approximation. In: Proceedings of the 31st ACM SIGPLAN conference on programming language design and implementation (PLDI\u201910), Toronto, Ontario, Canada, 5\u201310 June 2010, pp. 198\u2013209. New York, NY: ACM Press.","DOI":"10.1145\/1806596.1806620"},{"key":"bibr10-1094342016665471","unstructured":"Bailey PE, Marathe A, Lowenthal DK, (2015) Finding the limits of power-constrained application performance. In: Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201915), Austin, TX, 15\u201320 November 2015. ACM Press."},{"key":"bibr11-1094342016665471","doi-asserted-by":"crossref","unstructured":"Balaprakash P, Tiwari A, Wild SM (2013) Multi-objective optimization of HPC kernels for performance, power, and energy. In: Proceedings of 4th International Workshop on Performance Modeling, Benchmarking, and Simulation of HPC Systems (PMBS12), Denver, CO, 18 November 2013, pp. 239\u2013260. Springer Press. DOI: http:\/\/www.mcs.anl.gov\/papers\/P4069-0413.pdf","DOI":"10.1007\/978-3-319-10214-6_12"},{"key":"bibr12-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1137\/090769156"},{"key":"bibr13-1094342016665471","unstructured":"Bates NJ, Patterson MK (2013) Achieving the 20\u2009MW target: mobilizing the HPC community to accelerate energy efficient computing. In: D\u2019Hollander EH, (eds) Transition of HPC Towards Exascale Computing. IOS Press, pp. 37\u201345."},{"key":"bibr14-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2015.32"},{"key":"bibr15-1094342016665471","unstructured":"Benkner S, Franchetti F, Gerndt HM, (2014) Automatic Application Tuning for HPC Architectures. Dagstuhl Reports, Vol. 3-9. Dagstuhl Press, pp. 214\u2013244. DOI: http:\/\/dx.doi.org\/10.4230\/DagRep.3.9.214"},{"key":"bibr16-1094342016665471","doi-asserted-by":"crossref","unstructured":"Bienia C, Kumar S, Singh JP, (2008) The PARSEC Benchmark Suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201908), Toronto, Canada, 25\u201329 October 2008, pp. 72\u201381. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/1454115.1454128","DOI":"10.1145\/1454115.1454128"},{"key":"bibr17-1094342016665471","doi-asserted-by":"crossref","unstructured":"Bingham BD, Greenstreet MR (2008) Computation with energy-time trade-offs: models, algorithms and lower-bounds. In: Proceedings of the 2008 International Symposium on Parallel and Distributed Processing with Applications (ISPA\u201908), Sydney, NSW, 10\u201312 December 2008, pp. 143\u2013152. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/ISPA.2008.127","DOI":"10.1109\/ISPA.2008.127"},{"key":"bibr18-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339657"},{"key":"bibr19-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2009.41"},{"key":"bibr20-1094342016665471","doi-asserted-by":"crossref","unstructured":"Choi JW, Bedard D, Fowler R, (2013) A roofline model of energy. In: Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201913), Boston, Massachusetts, USA, 20\u201324 May 2013, pp. 661\u2013672. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IPDPS.2013.77","DOI":"10.1109\/IPDPS.2013.77"},{"key":"bibr21-1094342016665471","doi-asserted-by":"crossref","unstructured":"Collange S, Defour D, Tisserand A (2009) Power consumption of GPUs from a software perspective. In: Proceedings of the 9th International Conference on Computational Science (ICCS\u201909), Baton Rouge, LA, 25\u201327 May 2009, pp. 914\u2013923. Berlin, Heidelberg: Springer Press. DOI: http:\/\/dx.doi.org\/10.1007\/978-3-642-01970-8_92","DOI":"10.1007\/978-3-642-01970-8_92"},{"key":"bibr22-1094342016665471","doi-asserted-by":"crossref","unstructured":"Conner S, Akioka S, Irwin MJ, (2007) Link shutdown opportunities during collective communications in 3-D torus nets. In: Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201907), Long Beach, California, 26\u201330 March 2007, pp. 1\u20138. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IPDPS.2007.370534","DOI":"10.1109\/IPDPS.2007.370534"},{"key":"bibr23-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2007.70804"},{"key":"bibr24-1094342016665471","doi-asserted-by":"crossref","unstructured":"Curtis-Maury M, Dzierwa J, Antonopoulos CD, (2006a) Online strategies for high-performance power-aware thread execution on emerging multiprocessors. In: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201906), Rhodes Island, Greece, 25\u201329 April 2006, pp. 298\u2013307. IEEE Press.","DOI":"10.1109\/IPDPS.2006.1639598"},{"key":"bibr25-1094342016665471","doi-asserted-by":"crossref","unstructured":"Curtis-Maury M, Dzierwa J, Antonopoulos CD, (2006b) Online power-performance adaptation of multithreaded programs using hardware event-based prediction. In: Proceedings of the 20th annual international conference on Supercomputing (ICS\u201906), Cairns, Queensland, Australia, 28 June\u20131 July 2006, pp. 157\u2013166. New York, NY: ACM Press.","DOI":"10.1145\/1183401.1183426"},{"key":"bibr26-1094342016665471","doi-asserted-by":"crossref","unstructured":"David H, Gorbatov E, Hanebutte UR, (2010) RAPL: memory power estimation and capping. In: Proceedings of the 16th ACM\/IEEE international symposium on Low power electronics and design (ISLPED\u201910), Austin, TX, 18\u201320 August 2010, pp. 189\u2013194. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/1840845.1840883","DOI":"10.1145\/1840845.1840883"},{"key":"bibr27-1094342016665471","first-page":"46","volume":"15","author":"Demmel J","year":"2009","journal-title":"SciDAC Review"},{"key":"bibr28-1094342016665471","doi-asserted-by":"crossref","unstructured":"Demmel J, Gearhart A, Lipshitz B, (2013) Perfect strong scaling using no additional energy. In: Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201913), Boston, Massachusetts, 20\u201324 May 2013, pp. 649\u2013660. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IPDPS.2013.32","DOI":"10.1109\/IPDPS.2013.32"},{"key":"bibr29-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS.2015.87"},{"key":"bibr30-1094342016665471","doi-asserted-by":"crossref","unstructured":"Dickov B, Pericas M, Carpenter PM, (2014) Software-managed power reduction in InfiniBand links. In: Proceedings of the 2015 International Conference on Parallel Processing (ICPP\u201915), Minneapolis, MN, 9\u201312 September 2014, pp. 311\u2013320. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/ICPP.2014.40","DOI":"10.1109\/ICPP.2014.40"},{"key":"bibr31-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1177\/1094342010391989"},{"key":"bibr32-1094342016665471","doi-asserted-by":"crossref","unstructured":"Dongarra J, Ltaief H, Luszczek P, (2012) Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architecture. In: Proceedings of the 2nd International Conference on Cloud and Green Computing (CGC\u201912), Xiangtan, Hunan, 1\u20133 November 2012, pp. 274\u2013281. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/CGC.2012.113","DOI":"10.1109\/CGC.2012.113"},{"key":"bibr33-1094342016665471","doi-asserted-by":"crossref","unstructured":"Dreslinski RG, Wieckowski M, Blaauw D, (2010) Near-threshold computing: reclaiming Moore\u2019s law through energy efficient integrated circuits. Proceedings of the IEEE 98(2): 253\u2013266. DOI: http:\/\/dx.doi.org\/10.1109\/JPROC.2009.2034764","DOI":"10.1109\/JPROC.2009.2034764"},{"key":"bibr34-1094342016665471","doi-asserted-by":"crossref","unstructured":"Ellsworth DA, Malony AD, Rountree B, (2015) Dynamic power sharing for higher job throughput. In: Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201915), Austin, TX, 15\u201320 November 2015. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/2807591.2807643","DOI":"10.1145\/2807591.2807643"},{"key":"bibr35-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-36612-1_12"},{"key":"bibr36-1094342016665471","doi-asserted-by":"crossref","unstructured":"Enos J, Steffen C, Fullop J, (2010) Quantifying the impact of GPUs on performance and energy efficiency in HPC clusters. In: Proceedings of the 2010 International Green Computing Conference, Chicago, IL, 15\u201318 August 2010, pp. 317\u2013324. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/GREENCOMP.2010.5598297","DOI":"10.1109\/GREENCOMP.2010.5598297"},{"key":"bibr37-1094342016665471","doi-asserted-by":"crossref","unstructured":"Esmaeilzadeh H, Cao T, Yang X, (2011) Looking back on the language and hardware revolutions: measured power, performance, and scaling. In: Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI), Newport Beach, California, 5\u201311 March 2011, pp. 319\u2013332. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/1950365.1950402","DOI":"10.1145\/1950365.1950402"},{"key":"bibr38-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2012.20"},{"key":"bibr39-1094342016665471","doi-asserted-by":"crossref","unstructured":"Etinski M, Corbalan J, Labarta J, (2010) Optimizing Job Performance Under a Given Power Constraint in HPC Centers. In: Proceedings of the 2010 International Green Computing Conference, Chicago, IL, 15\u201318 August 2010, pp. 257\u2013267. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/GREENCOMP.2010.5598303","DOI":"10.1109\/GREENCOMP.2010.5598303"},{"key":"bibr40-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2012.08.001"},{"key":"bibr41-1094342016665471","doi-asserted-by":"crossref","unstructured":"Feng X, Ge R, Cameron KW (2005) Power and energy profiling of scientific applications on distributed systems. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201905), Denver, CO, 4\u20138 April 2005, p. 34. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IPDPS.2005.346","DOI":"10.1109\/IPDPS.2005.346"},{"issue":"4","key":"bibr42-1094342016665471","first-page":"110","volume":"9","author":"Fowers J","year":"2013","journal-title":"ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers"},{"key":"bibr43-1094342016665471","doi-asserted-by":"crossref","unstructured":"Freeh VW, Bletsch TK, Rawson FL (2007a) Scaling and packing on a chip multiprocessor. In: Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201907), Long Beach, California, 26\u201330 March 2007, pp. 1\u20138. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IPDPS.2007.370539","DOI":"10.1109\/IPDPS.2007.370539"},{"key":"bibr44-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2007.1026"},{"key":"bibr45-1094342016665471","doi-asserted-by":"crossref","unstructured":"Freeh VW, Pan F, Kappiah N, (2005a) Using multiple energy gears in MPI programs on a power-scalable cluster. In: Proceedings of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2005), Chicago, IL, 15\u201317 June 2005, pp. 164\u2013173. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/1065944.1065967","DOI":"10.1145\/1065944.1065967"},{"key":"bibr46-1094342016665471","doi-asserted-by":"crossref","unstructured":"Freeh VW, Pan F, Kappiah N, (2005b) Exploring the energy-time tradeoff in MPI programs on a power-scalable cluster. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201905), Denver, CO, 4\u20138 April 2005. IEEE Press, 4a. DOI: http:\/\/dx.doi.org\/10.1109\/IPDPS.2005.346","DOI":"10.1109\/IPDPS.2005.214"},{"key":"bibr47-1094342016665471","doi-asserted-by":"crossref","unstructured":"Ge R, Cameron KW (2007) Power-aware speedup. In: Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201907), Long Beach, California, 26\u201330 March 2007, pp. 1\u201310. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IPDPS.2007.370246","DOI":"10.1109\/IPDPS.2007.370246"},{"key":"bibr48-1094342016665471","doi-asserted-by":"crossref","unstructured":"Ge R, Feng X, Cameron KW (2005) Performance-constrained distributed DVS scheduling for scientific applications on power-aware clusters. In: Proceedings of the 2005 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201905), Seattle, WA, 12\u201318 November 2005, p. 34. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1109\/SC.2005.57","DOI":"10.1109\/SC.2005.57"},{"key":"bibr49-1094342016665471","doi-asserted-by":"crossref","unstructured":"Ge R, Feng X, Feng W-C, (2007) CPU MISER: a performance-directed, run-time system for power-aware clusters. In: Proceedings of the 2007 International Conference on Parallel Processing (ICPP\u201907), XiAn, China, 10\u201314 September 2007, p. 18. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/ICPP.2007.29","DOI":"10.1109\/ICPP.2007.29"},{"key":"bibr50-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2009.76"},{"key":"bibr51-1094342016665471","doi-asserted-by":"crossref","unstructured":"Ge R, Vogt R, Majumder J, (2013) Effects of dynamic voltage and frequency scaling on a K20 GPU. In: Proceedings of the 42nd International Conference on Parallel Processing (ICPP\u201913), Lyon, France, 1\u20134 October 2013, pp. 826\u2013833. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/ICPP.2013.98","DOI":"10.1109\/ICPP.2013.98"},{"key":"bibr52-1094342016665471","doi-asserted-by":"crossref","unstructured":"Georgiou Y, Cadeau T, Glesser D, (2014) Energy accounting and control with SLURM resource and job management system. In: Chatterjee M, Cao JN, Kothapalli K, (eds) Distributed Computing and Networking. Lecture Notes in Computer Science, Vol. 8314. Berlin: Springer Berlin Heidelberg Press, pp. 96\u2013118. DOI: http:\/\/dx.doi.org\/10.1007\/978-3-642-45249-9_7.","DOI":"10.1007\/978-3-642-45249-9_7"},{"key":"bibr53-1094342016665471","doi-asserted-by":"crossref","unstructured":"Ghosh S, Chandrasekaran S, Chapman B (2012) Energy analysis of parallel scientific kernels on multiple GPUs. In: Proceedings of the 2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC), Argonne, IL, 10\u201311 July 2012, pp. 54\u201363. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/SAAHPC.2012.17","DOI":"10.1109\/SAAHPC.2012.17"},{"key":"bibr54-1094342016665471","doi-asserted-by":"crossref","unstructured":"Grant RE, Afsahi A (2006) Power-performance efficiency of asymmetric multiprocessors for multi-threaded scientific applications. In: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201906), Rhodes Island, Greece, 25\u201329 April 2006, pp. 300\u2013308. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IPDPS.2006.1639601","DOI":"10.1109\/IPDPS.2006.1639601"},{"key":"bibr55-1094342016665471","author":"Greenhalgh P","year":"2011","journal-title":"ARM Whitepaper"},{"key":"bibr56-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1137\/100788926"},{"key":"bibr57-1094342016665471","doi-asserted-by":"publisher","DOI":"10.2172\/1331496"},{"key":"bibr58-1094342016665471","doi-asserted-by":"crossref","unstructured":"Gschwandtner P, Chalios C, Nikolopoulos DS, (2015) On the potential of significance-driven execution for energy-aware HPC. Computer Science \u2013 Research and Development 30(2): 197\u2013206. DOI: http:\/\/dx.doi.org\/10.1007\/s00450-014-0265-9","DOI":"10.1007\/s00450-014-0265-9"},{"key":"bibr59-1094342016665471","doi-asserted-by":"crossref","unstructured":"Gschwandtner P, Durillo JJ, Fahringer T (2014) Multi-objective auto-tuning with Insieme: optimization and trade-off analysis for time, energy and resource usage. In: Proceedings of the 20th European Conference on Parallel Processing (Euro-Par 2014), Porto, Portugal, 25\u201329 August 2014, pp. 87\u201398. Springer International Publishing. DOI: http:\/\/dx.doi.org\/10.1007\/978-3-319-09873-9_8","DOI":"10.1007\/978-3-319-09873-9_8"},{"key":"bibr60-1094342016665471","doi-asserted-by":"crossref","unstructured":"Hackenberg D, Ilsche T, Schone R, (2013) Power measurement techniques on standard compute nodes: a quantitative comparison. In: Proceedings of 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, TX, 21\u201323 April 2013, pp. 194\u2013204. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/ISPASS.2013.6557170","DOI":"10.1109\/ISPASS.2013.6557170"},{"key":"bibr61-1094342016665471","volume-title":"Proceedings of the 2014 CUG (Cray User Group) meeting","author":"Hart A","year":"2014"},{"key":"bibr62-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1007\/s00450-011-0192-y"},{"key":"bibr201-1094342016665471","doi-asserted-by":"crossref","unstructured":"Hoefler T (2010) Software and hardware techniques for power-efficient HPC networking. Computing in Science & Engineering 12(6): 30\u201337. DOI: http:\/\/dx.doi.org\/10.1109\/MCSE.2010.96","DOI":"10.1109\/MCSE.2010.96"},{"key":"bibr63-1094342016665471","volume-title":"Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures. Computer Science and Artificial Intelligence Laboratory Technical Report","author":"Hoffmann H","year":"2009"},{"key":"bibr64-1094342016665471","doi-asserted-by":"crossref","unstructured":"Hoffmann H, Sidiroglou S, Carbin M, (2011) Dynamic knobs for responsive power-aware computing. In: Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI), Newport Beach, California, 5\u201311 March 2011, pp. 199\u2013212. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/1950365.1950390","DOI":"10.1145\/1950365.1950390"},{"key":"bibr65-1094342016665471","doi-asserted-by":"crossref","unstructured":"Hong S, Kim H (2010) An integrated GPU power and performance model. In: Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA\u201910), Saint-Malo, France, 19\u201323 June 2010, pp. 280\u2013289. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/1816038.1815998","DOI":"10.1145\/1815961.1815998"},{"key":"bibr66-1094342016665471","doi-asserted-by":"crossref","unstructured":"Hsu C-H, Feng W-C (2005) A power-aware run-time system for high-performance computing. In: Proceedings of the 2005 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201905), Seattle, WA, 12\u201318 November 2005, p. 1. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/SC.2005.57","DOI":"10.1109\/SC.2005.57"},{"key":"bibr67-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-35767-X_6"},{"key":"bibr68-1094342016665471","doi-asserted-by":"crossref","unstructured":"Hsu C-H, Kremer U (2003b) The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction. In: Proceedings of the 24th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201903), San Diego, CA, 8\u201311 June 2003, pp. 38\u201348. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/780822.781137","DOI":"10.1145\/781136.781137"},{"key":"bibr69-1094342016665471","doi-asserted-by":"crossref","unstructured":"Huang S, Xiao S, Feng W (2009) On the energy efficiency of graphics processing units for scientific computing. In: Proceedings of the 23rd International Parallel and Distributed Processing Symposium (IPDPS\u201909), Rome, Italy, 23\u201329 May 2009, pp. 1\u20138. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IPDPS.2009.5160980","DOI":"10.1109\/IPDPS.2009.5160980"},{"key":"bibr70-1094342016665471","unstructured":"IEEE 802.3az (2010) Active\/Idle Toggling with Low Power Idle."},{"key":"bibr71-1094342016665471","doi-asserted-by":"crossref","unstructured":"Inadomi Y, Patki T, Inoue K, (2015) Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing. In: Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201915), Austin, TX, 15\u201320 November 2015. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/2807591.2807639","DOI":"10.1145\/2807591.2807639"},{"key":"bibr72-1094342016665471","volume-title":"InfiniBand Architecture Specification","author":"InfiniBand Trade Association","year":"2002"},{"key":"bibr73-1094342016665471","volume-title":"System Programming Guide, volume 3B-2 of Intel 64 and IA-32 Architectures Software Developer\u2019s Manual","author":"Intel Corp","year":"2011"},{"key":"bibr74-1094342016665471","volume-title":"Intel\u00ae Xeon\u00ae Processor Specification","author":"Intel Corp","year":"2013"},{"key":"bibr75-1094342016665471","volume-title":"IPMI-Intelligent Platform Management Interface Specification Second Generation","author":"Intel Corp","year":"2013"},{"key":"bibr205-1094342016665471","doi-asserted-by":"crossref","unstructured":"Jana S, Chapman B (2015) Impact of frequency scaling on one sided remote memory accesses. In: Proceedings of the 9th international conference on partitioned global address space programming models (PGAS'15), Washington, DC, 16\u201318 September 2015, pp. 25\u201337. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/PGAS.2015.11","DOI":"10.1109\/PGAS.2015.11"},{"key":"bibr202-1094342016665471","doi-asserted-by":"crossref","unstructured":"Jana S, Schuchart J, Chapman B (2014a) Analysis of energy and performance of RDMA-based data access patterns. In: Proceedings of the 8th international conference on partitioned global address space programming models (PGAS'14). Eugene, OR, 6\u201310 October 2014. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/2676870.2676882","DOI":"10.1145\/2676870.2676882"},{"key":"bibr203-1094342016665471","doi-asserted-by":"crossref","unstructured":"Jana S, Hernandez O, Poole S, (2014b) Power consumption due to data movement in distributed programming models. In: Proceedings of the 20th international conference euro-par 2014 parallel processing, Porto, Portugal, 25\u201329 August 2014, pp. 366\u2013378. Springer Press. DOI: http:\/\/dx.doi.org\/10.1007\/978-3-319-09873-9_31","DOI":"10.1007\/978-3-319-09873-9_31"},{"key":"bibr204-1094342016665471","doi-asserted-by":"crossref","unstructured":"Jana S, Hernandez O, Poole S, (2014c) Analyzing the energy and power consumption of remote memory accesses in the OpenSHMEM model. In: Proceedings of the 1st OpenSHMEM workshop: experiences, implementations and tools, Annapolis, MD, 4\u20136 March 2014, pp. 59\u201373. Springer Press. DOI: http:\/\/dx.doi.org\/10.1007\/978-3-319-05215-1_5","DOI":"10.1007\/978-3-319-05215-1_5"},{"key":"bibr76-1094342016665471","doi-asserted-by":"crossref","unstructured":"Jiao Y, Lin H, Balaji P, (2010) Power and performance characterization of computational kernels on the GPU. In: Proceedings of the 2010 IEEE\/ACM International Conference on Green Computing and Communications & International Conference on Cyber, Physical and Social Computing (GREENCOM-CPSCOM\u201910), Hangzhou, China, 18\u201320 December 2010, pp. 221\u2013228. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/GreenCom-CPSCom.2010.143","DOI":"10.1109\/GreenCom-CPSCom.2010.143"},{"key":"bibr77-1094342016665471","doi-asserted-by":"crossref","unstructured":"Jordan H, Thoman P, Durillo JJ, (2012) A multi-objective auto-tuning framework for parallel codes. In: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201912), Salt Lake City, Utah, 10\u201316 November 2012, pp. 1\u201312. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1109\/SC.2012.7","DOI":"10.1109\/SC.2012.7"},{"key":"bibr78-1094342016665471","doi-asserted-by":"crossref","unstructured":"Kale L, Krishnan S (1993) CHARM++: a portable concurrent object oriented system based on C++. In: Proceedings of the 8th annual conference on object-oriented programming systems, languages, and applications (OOPSLA'93), Washington, DC, 26 September\u201301 October 1993, pp. 91\u2013108. ACM Press. DOI:http:\/\/dx.doi.org\/10.1145\/165854.165874","DOI":"10.1145\/165854.165874"},{"key":"bibr206-1094342016665471","doi-asserted-by":"crossref","unstructured":"Kandalla K, Mancini EP, Sur S, (2010) Designing power-aware collective communication algorithms for InfiniBand clusters. In: Proceedings of the 39th International Conference on Parallel Processing (ICPP'10), San Diego, CA, 13\u201316 September 2010, pp. 218\u2013227. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/ICPP.2010.78","DOI":"10.1109\/ICPP.2010.78"},{"key":"bibr79-1094342016665471","doi-asserted-by":"crossref","unstructured":"Kappiah N, Freeh VW, Lowenthal DK (2005) Just in time dynamic voltage scaling: exploiting inter-node slack to save energy in MPI programs. In: Proceedings of the 2005 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201905), Seattle, WA, 12\u201318 November 2005, p. 33. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1109\/SC.2005.57","DOI":"10.1109\/SC.2005.39"},{"key":"bibr80-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2013.71"},{"key":"bibr81-1094342016665471","doi-asserted-by":"publisher","DOI":"10.2200\/S00119ED1V01Y200805CAC004"},{"key":"bibr82-1094342016665471","doi-asserted-by":"crossref","unstructured":"Keramidas G, Spiliopoulos V, Kaxiras S (2010) Interval-based models for run-time DVFS orchestration in superscalar processors. In: Proceedings of the 7th ACM international Conference on Computing Frontiers (CF\u201910), Bertinoro, Italy, 17\u201319 May 2010, pp. 287\u2013296. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/1787275.1787338","DOI":"10.1145\/1787275.1787338"},{"key":"bibr83-1094342016665471","doi-asserted-by":"crossref","unstructured":"Kestor G, Gioiosa R, Kerbyson D, (2013) Quantifying the energy cost of data movement in scientific applications. In: Proceedings of 2013 IEEE international symposium on workload characterization (IISWC), Portland, OR, 22\u201324 September 2013, pp. 56\u201365. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IISWC.2013.6704670","DOI":"10.1109\/IISWC.2013.6704670"},{"key":"bibr210-1094342016665471","author":"Kogge P","year":"2008","journal-title":"ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems. DARPA Report"},{"key":"bibr84-1094342016665471","doi-asserted-by":"crossref","unstructured":"Korthikanti VA, Agha G (2010) Avoiding energy wastage in parallel applications. In: Proceedings of the 2010 International Green Computing Conference, Chicago, IL, 15\u201318 August 2010, pp. 149\u2013163. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/GREENCOMP.2010.5598314","DOI":"10.1109\/GREENCOMP.2010.5598314"},{"key":"bibr85-1094342016665471","doi-asserted-by":"crossref","unstructured":"Lam MO, Hollingsworth JK, de Supinski BR, (2013) Automatically adapting programs for mixed-precision floating-point computation. In: Proceedings of the 27th annual international conference on supercomputing (ICS'13), Eugene, Oregon, 10\u201314 June 2013, pp. 369\u2013378. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/2464996.2465018","DOI":"10.1145\/2464996.2465018"},{"key":"bibr86-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4471-4492-2"},{"key":"bibr87-1094342016665471","volume-title":"Proceedings of the 2012 Symposium on High Performance Computing (HPC\u201912)","author":"Laros JH","year":"2012"},{"key":"bibr88-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1109\/IGCC.2013.6604485"},{"key":"bibr89-1094342016665471","doi-asserted-by":"crossref","unstructured":"Lawson B, Smirni E (2005) Power-aware resource allocation in high-end systems via online simulation. In: Proceedings of the 19th Annual International Conference on Supercomputing (ICS\u201905), Boston, MA, 20\u201322 June 2005, pp. 229\u2013238. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/1088149.1088179","DOI":"10.1145\/1088149.1088179"},{"key":"bibr90-1094342016665471","doi-asserted-by":"crossref","unstructured":"Leon EA, Karlin I, Grant RE (2015) Optimizing explicit hydrodynamics for power, energy, and performance. In: Proceedings of 2015 IEEE International Conference on Cluster Computing (CLUSTER), Chicago, IL, 8\u201311 September 2015, pp. 11\u201321. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/CLUSTER.2015.12","DOI":"10.1109\/CLUSTER.2015.12"},{"key":"bibr91-1094342016665471","doi-asserted-by":"crossref","unstructured":"Li D, Byna S, Chakradhar S (2011a) Energy-aware workload consolidation on GPU. In: Proceedings of the 40th International Conference Parallel Processing Workshops (ICPPW\u201911), Taipei, Taiwan, China, 13\u201316 September 2011, pp. 389\u2013398. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/ICPPW.2011.25","DOI":"10.1109\/ICPPW.2011.25"},{"key":"bibr92-1094342016665471","doi-asserted-by":"crossref","unstructured":"Li B, Chang H-C, Song SL, Su C-Y, (2014) The power-performance tradeoffs of the Intel Xeon Phi on HPC applications. In: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW\u201914), Phoenix, Arizona, 19\u201323 May 2014, pp. 1448\u20131456. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IPDPSW.2014.162","DOI":"10.1109\/IPDPSW.2014.162"},{"key":"bibr93-1094342016665471","doi-asserted-by":"crossref","unstructured":"Li S, Lim K, Faraboschi P, (2011b) System-level integrated server architectures for scale-out datacenters. In: Proceedings of the 44th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-44), Porto Alegre, Brazil, 4\u20137 December 2011, pp. 260\u2013271. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/2155620.2155651","DOI":"10.1145\/2155620.2155651"},{"key":"bibr94-1094342016665471","doi-asserted-by":"crossref","unstructured":"Li J, Martinez JF (2006) Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In: Proceedings of the 12th International Symposium on High-Performance Computer Architecture (HPCA\u201906), Austin, TX, 11\u201315 February 2006, pp. 77\u201387. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/HPCA.2006.1598114","DOI":"10.1109\/HPCA.2006.1598114"},{"key":"bibr95-1094342016665471","doi-asserted-by":"crossref","unstructured":"Li D, Nikolopoulos DS, Cameron KW, (2010a) Power-Aware MPI Task Aggregation Prediction for High-End Computing Systems. In Proceedings of the 2010 IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201910), Atlanta, GA, 19\u201323 April 2010, pp. 1\u201312. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IPDPS.2010.5470463","DOI":"10.1109\/IPDPS.2010.5470464"},{"key":"bibr96-1094342016665471","doi-asserted-by":"crossref","unstructured":"Li D, de Supinski BR, Schulz M, (2010b) Hybrid MPI\/OpenMP power-aware computing. In: Proceedings of the 2010 IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201910), Atlanta, GA, 19\u201323 April 2010, pp. 1\u201312. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IPDPS.2010.5470463","DOI":"10.1109\/IPDPS.2010.5470464"},{"key":"bibr97-1094342016665471","doi-asserted-by":"crossref","unstructured":"Lim MY, Freeh VW, Lowenthal DK (2006) Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs. In: Proceedings of the 2006 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201906), Tampa, FL, 11\u201317 November 2006, p. 14. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1109\/SC.2006.11","DOI":"10.1109\/SC.2006.11"},{"key":"bibr98-1094342016665471","doi-asserted-by":"crossref","unstructured":"Linderman MD, Ho M, Dill DL, (2010) Towards program optimization through automated analysis of numerical precision. In: Proceedings of the 8th Annual IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201910), Toronto, Canada, 24\u201328 April 2010, pp. 230\u2013237. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/1772954.1772987","DOI":"10.1145\/1772954.1772987"},{"key":"bibr99-1094342016665471","doi-asserted-by":"crossref","unstructured":"Liu J, Poff D, Abali B (2009) Evaluating high performance communication: a power perspective. In: Proceedings of the 23rd International Conference on Supercomputing (ICS\u201909), Yorktown Heights, NY, 8\u201312 June 2009, pp. 326\u2013337. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/1542275.1542322","DOI":"10.1145\/1542275.1542322"},{"key":"bibr100-1094342016665471","doi-asserted-by":"crossref","unstructured":"Luk C-K, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-42), New York, NY, 12\u201316 December 2009. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/1669112.1669121","DOI":"10.1145\/1669112.1669121"},{"key":"bibr101-1094342016665471","doi-asserted-by":"crossref","unstructured":"Ma K, Li X, Chen W, (2012) GreenGPU: a holistic approach to energy efficiency in GPU-CPU heterogeneous architectures. In: Proceedings of the 41st International Conference on Parallel Processing (ICPP\u201912), Pittsburgh, PA, 10\u201313 September 2012, pp. 48\u201357. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/ICPP.2012.31","DOI":"10.1109\/ICPP.2012.31"},{"key":"bibr102-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1007\/s00450-011-0189-6"},{"key":"bibr103-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-20119-1_28"},{"key":"bibr104-1094342016665471","volume-title":"Proceedings of the 2015 CUG (Cray User Group) meeting","author":"Martin SJ","year":"2015"},{"key":"bibr105-1094342016665471","doi-asserted-by":"crossref","unstructured":"Mei X, Yung LS, Zhao K, (2013) A measurement study of GPU DVFS on energy conservation. In: Proceedings of the 2013 Workshop on Power-Aware Computing and Systems (HotPower\u201913), Farmington, Pennsylvania, 3\u20136 November 2013. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/2525526.2525847","DOI":"10.1145\/2525526.2525852"},{"key":"bibr106-1094342016665471","doi-asserted-by":"crossref","unstructured":"Miceli R, Civario G, Sikora A, (2012) AutoTune: a plugin-driven approach to the automatic tuning of parallel applications. In: Proceedings of the 11th International Conference on Applied Parallel and Scientific Computing (PARA\u201912), Helsinki, Finland, 10\u201313 June 2012, pp. 328\u2013342. Berlin, Heidelberg: Springer-Verlag. DOI: http:\/\/dx.doi.org\/10.1007\/978-3-642-36803-5_24.","DOI":"10.1007\/978-3-642-36803-5_24"},{"key":"bibr107-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1996.1285"},{"key":"bibr108-1094342016665471","doi-asserted-by":"crossref","unstructured":"Minartz T, Ludwig T, Knobloch M, (2011) Managing hardware power saving modes for high performance computing. In: Proceedings of 2011 International on Green Computing Conference and Workshops (IGCC\u201911), Orlando, FL, 25\u201328 July 2011, pp. 1\u20138. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IGCC.2011.6008581","DOI":"10.1109\/IGCC.2011.6008581"},{"key":"bibr109-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1145\/2636342"},{"key":"bibr110-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1007\/s00450-013-0238-4"},{"key":"bibr111-1094342016665471","doi-asserted-by":"crossref","unstructured":"Miwa S, Nakamura H (2015) Profile-based power shifting in interconnection networks with on\/off links. In: Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201915), Austin, TX, 15\u201320 November 2015. ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/2807591.2807639","DOI":"10.1145\/2807591.2807639"},{"key":"bibr112-1094342016665471","unstructured":"MPI Forum (2015) MPI: a message-passing interface standard. Version 3.1. High Performance Computing Center Stuttgart."},{"key":"bibr113-1094342016665471","doi-asserted-by":"crossref","unstructured":"Mukhanov L, Nikolopoulos DS, de Supinski BR (2015) ALEA: fine-grain energy profiling with basic block sampling. In: Proceedings of the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT-2015), San Francisco, CA, 18\u201321 October 2015, pp. 87\u201398. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/PACT.2015.16","DOI":"10.1109\/PACT.2015.16"},{"key":"bibr114-1094342016665471","first-page":"323","volume-title":"Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation (NSDI\u201908)","author":"Nedevschi S","year":"2008"},{"key":"bibr115-1094342016665471","volume-title":"TESLA K20 GPU Accelerator Board Specification","author":"NVIDIA Corp","year":"2013"},{"key":"bibr116-1094342016665471","volume-title":"NVML (NVIDIA Management Library) Reference Manual","author":"NVIDIA Corp","year":"2015"},{"key":"bibr117-1094342016665471","author":"OpenMP ARB (Architecture Review Board)","year":"2015","journal-title":"OpenMP Application Program Interface, Version 4.5"},{"key":"bibr118-1094342016665471","doi-asserted-by":"crossref","unstructured":"Patki T, Lowenthal DK, Rountree B, (2013) Exploring hardware overprovisioning in power-constrained, high performance computing. In: Proceedings of the 27th International Conference on Supercomputing (ICS'13), Eugene, Oregon, 10\u201314 June 2013, pp. 173\u2013182. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/2464996.2465009.","DOI":"10.1145\/2464996.2465009"},{"key":"bibr119-1094342016665471","doi-asserted-by":"crossref","unstructured":"Patki T, Sasidharan A, Maiterth M, (2015) Practical resource management in power-constrained, high performance computing. In: Proceedings of the 24th IEEE International Symposium on High Performance Distributed Computing (HPDC'15), Portland, Oregon, 15\u201319 June 2015, pp. 121\u2013132. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/2749246.2749262.","DOI":"10.1145\/2749246.2749262"},{"key":"bibr120-1094342016665471","volume-title":"The Berkeley Par Lab: Progress in the Parallel Computing Landscape","author":"Patterson D","year":"2013"},{"key":"bibr121-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-38750-0_28"},{"key":"bibr122-1094342016665471","doi-asserted-by":"crossref","unstructured":"Pedretti K, Olivier SL, Ferreira KB, Early experiences with node-level power capping on the cray XC40 Platform. In: Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing, Austin, TX, 15\u201320 November 2015. ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/2834800.2834801","DOI":"10.1145\/2834800.2834801"},{"key":"bibr123-1094342016665471","first-page":"1","author":"Price DC","year":"2015","journal-title":"Computer Science - Research and Development"},{"key":"bibr124-1094342016665471","doi-asserted-by":"crossref","unstructured":"Rahman SMF, Guo J, Bhat J A, (2012) Studying the impact of application-level optimizations on the power consumption of multi-core architectures. In: Proceedings of the 9th Conference on Computing Frontiers(CF\u201912), Cagliari, Italy, 15\u201317 May 2012, pp. 123\u2013132. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/2212908.2212927","DOI":"10.1145\/2212908.2212927"},{"key":"bibr125-1094342016665471","doi-asserted-by":"crossref","unstructured":"Rahman SF, Guo J, Yi Q (2011) Automated empirical tuning of scientific codes for performance and power consumption. In: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC\u201911), Heraklion, Greece, 24\u201326 January 2011, pp. 107\u2013116. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/1944862.1944880","DOI":"10.1145\/1944862.1944880"},{"key":"bibr126-1094342016665471","doi-asserted-by":"crossref","unstructured":"Rajovic N, Carpenter P, Gelado I, (2013) Supercomputing with commodity CPUs: are mobile SoCs ready for HPC? In: Proceedings of the 2013 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201913), Denver, CO, 17\u201321 November 2013, pp. 1\u201312. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1145\/2503210.2503281","DOI":"10.1145\/2503210.2503281"},{"key":"bibr127-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2012.12"},{"key":"bibr128-1094342016665471","doi-asserted-by":"crossref","unstructured":"Rountree B, Ahn DH, de Supinski BR, (2012) Beyond DVFS: a first look at performance under a hardware-enforced power bound. In: Proceedings of 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW\u201912), Shanghai, China, 21\u201325 May 2012, pp. 947\u2013953. IEEE Press. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IPDPSW.2012.116","DOI":"10.1109\/IPDPSW.2012.116"},{"key":"bibr129-1094342016665471","doi-asserted-by":"crossref","unstructured":"Rountree B, Lowenthal DK, Funk S, (2007) Bounding energy consumption in large-scale MPI programs. In: Proceedings of the 2007 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201907), Reno, NV, 10\u201316 November 2007, pp. 1\u20139. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1145\/1362622.1362688","DOI":"10.1145\/1362622.1362688"},{"key":"bibr130-1094342016665471","doi-asserted-by":"crossref","unstructured":"Rountree B, Lowenthal DK, Schulz M, (2011) Practical performance prediction under dynamic voltage frequency scaling. In: Proceedings of the 2nd International Green Computing Conference (IGCC'11), Orlando, Florida, 25\u201328 July 2011, pp. 1\u20138. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IGCC.2011.6008553","DOI":"10.1109\/IGCC.2011.6008553"},{"key":"bibr131-1094342016665471","doi-asserted-by":"crossref","unstructured":"Rountree B, Lowenthal DK, de Supinski BR, (2009) Adagio: making DVS practical for complex HPC applications. In: Proceedings of the 23rd International Conference on Supercomputing (ICS\u201909), Yorktown Heights, NY, 8\u201312 June 2009, pp. 460\u2013469. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/1542275.1542340","DOI":"10.1145\/1542275.1542340"},{"key":"bibr132-1094342016665471","doi-asserted-by":"crossref","unstructured":"Rubio-Gonz\u00e1lez C, Nguyen C, Nguyen HD, (2013) Precimonious: tuning assistant for floating-point precision. In: Proceedings of the 2013 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201913), Denver, CO, 17\u201321 November 2013. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1145\/2510000\/2503296","DOI":"10.1145\/2503210.2503296"},{"key":"bibr133-1094342016665471","doi-asserted-by":"crossref","unstructured":"Sampson A, Dietl W, Fortuna E, Gnanapragasam D, (2011) EnerJ: approximate data types for safe and general low-power computation. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201911), San Jose, CA, 4\u20138 June 2011, pp. 164\u2013174. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/1993316.1993518","DOI":"10.1145\/1993498.1993518"},{"key":"bibr134-1094342016665471","doi-asserted-by":"crossref","unstructured":"Saputra H, Kandemir M, Vijaykrishnan N, (2002) Energy-conscious compilation based on voltage scaling. In: Proceedings of the Joint Conference on Languages, Compilers and Tools for Embedded Systems: Software and Compilers for Embedded Systems (LCTES\/SCOPES\u201902), Berlin, Germany, 19\u201321 June 2002, pp. 2\u201311. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/513829.513832","DOI":"10.1145\/513829.513832"},{"key":"bibr135-1094342016665471","doi-asserted-by":"crossref","unstructured":"Saravanan KP, Carpenter PM, Ramirez A (2013) Power\/performance evaluation of energy efficient Ethernet (EEE) for high performance computing. In: Proceedings of 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, TX, 21\u201323 April 2013, pp. 205\u2013214. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/ISPASS.2013.6557171","DOI":"10.1109\/ISPASS.2013.6557171"},{"key":"bibr136-1094342016665471","doi-asserted-by":"crossref","unstructured":"Saravanan KP, Carpenter PM, Ramirez A (2014) A performance perspective on energy efficient HPC links. In: Proceedings of the 28th International Conference on Supercomputing (ICS\u201914), Muenchen, Germany, 10\u201313 June 2014, pp. 313\u2013322. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/2597652.2597671","DOI":"10.1145\/2597652.2597671"},{"key":"bibr137-1094342016665471","doi-asserted-by":"crossref","unstructured":"Sarood O, Langer A, Gupta A, (2014) Maximizing throughput of overprovisioned HPC data centers under a strict power budget. In: Proceedings of the 2014 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201914), New Orleans, LA, 16\u201321 November 2014, pp. 807\u2013818. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1109\/SC.2014.71","DOI":"10.1109\/SC.2014.71"},{"key":"bibr138-1094342016665471","doi-asserted-by":"crossref","unstructured":"Sarood O, Langer A, Kale L, (2013) Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC Systems. In: Proceedings of 2013 IEEE International Conference on Cluster Computing (CLUSTER), Indianapolis, IN, 23\u201327 September 2013, pp. 1\u20138. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/CLUSTER. 2013.6702684.","DOI":"10.1109\/CLUSTER.2013.6702684"},{"key":"bibr140-1094342016665471","doi-asserted-by":"crossref","unstructured":"Sarood O, Miller P, Totoni E, (2012) \u2018Cool\u2019 load balancing for high performance computing data centers. IEEE Transactions on Computers 61(2): 1752\u20131764. DOI: http:\/\/dx.doi.org\/10.1109\/TC.2012.143","DOI":"10.1109\/TC.2012.143"},{"key":"bibr139-1094342016665471","doi-asserted-by":"crossref","unstructured":"Schkufza E, Sharma R, Aiken A (2014) Stochastic optimization of floating-point programs with tunable precision. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201914), Edinburgh, UK, 9\u201311 June 2014, pp. 53\u201364. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/2594291.2594302","DOI":"10.1145\/2594291.2594302"},{"key":"bibr211-1094342016665471","doi-asserted-by":"crossref","unstructured":"Scogland T, Azose J, Rohr D, (2015) Node variability in large-scale power measurements: perspectives from the Green500, Top500 and EEHPCWG. In: Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201915), Austin, TX, 15\u201320 November 2015. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/2807591.2807658","DOI":"10.1145\/2807591.2807658"},{"key":"bibr141-1094342016665471","doi-asserted-by":"crossref","unstructured":"Scogland T, Steffen C, Wilde T, (2014) A power-measurement methodology for large-scale, high-performance computing. In: Proceedings of the 5th ACM\/SPEC International Conference on Performance Engineering (ICPE\u201914), Dublin, Ireland, 22\u201326 March 2014, pp. 149\u2013159. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/2568088.2576795","DOI":"10.1145\/2568088.2576795"},{"key":"bibr142-1094342016665471","first-page":"1","volume-title":"Proceedings of the 9th International Conference on High Performance Computing for Computational Science (VECPAR\u201910)","author":"Shalf J","year":"2010"},{"key":"bibr143-1094342016665471","doi-asserted-by":"crossref","unstructured":"Solomonik E, Demmel J (2011) Communication-optimal parallel 2.5 D matrix multiplication LU factorization algorithms. In: Proceedings of the 18th European Conference on Parallel Processing (Euro-Par 2011), Bordeaux, France, 29 August\u20132 September 2011, pp. 90\u2013109. Springer International Publishing. DOI: http:\/\/dx.doi.org\/10.1007\/978-3-642-23397-5_10","DOI":"10.1007\/978-3-642-23397-5_10"},{"key":"bibr144-1094342016665471","doi-asserted-by":"crossref","unstructured":"Song S, Su C-Y, Ge R, (2011) Iso-energy-efficiency: an approach to power-constrained parallel computation. In: Proceedings of the 2011 IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201911), Alaska, USA, 16\u201320 May 2011, pp. 128\u2013139. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IPDPS.2011.22","DOI":"10.1109\/IPDPS.2011.22"},{"key":"bibr146-1094342016665471","doi-asserted-by":"crossref","unstructured":"Suleman MA, Qureshi MK, Patt YN (2008) Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII), Seattle, WA, 1\u20135 March 2008, pp. 277\u2013286. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/1346281.1346317","DOI":"10.1145\/1346281.1346317"},{"key":"bibr147-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2002.10062"},{"key":"bibr212-1094342016665471","doi-asserted-by":"crossref","unstructured":"Tavarageri S, Sadayappan P (2013) A compiler analysis to determine useful cache size for energy efficiency. In: Proceedings of 27th international symposium on parallel & distributed processing workshops and PhD forum (IPDPSW'13), Cambridge, MA, 20\u201324 May 2013, pp. 923\u2013930. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IPDPSW.2013.268","DOI":"10.1109\/IPDPSW.2013.268"},{"key":"bibr148-1094342016665471","unstructured":"The Green 500 (2015) Available at: http:\/\/www.green500.org\/lists\/green201511 (accessed 20 August 2016)."},{"key":"bibr149-1094342016665471","doi-asserted-by":"crossref","unstructured":"Tiwari A, Laurenzano MA, Carrington L, (2012) Auto-tuning for energy usage in scientific applications. In: Alexander M, D\u2019Ambra P, Belloum A, (eds) Euro-Par 2011: Parallel Processing Workshops. Lecture Notes in Computer Science Vol. 7156, 2012. Springer International Publishing, pp. 178\u2013187. DOI: http:\/\/dx.doi.org\/10.1007\/978-3-642-29740-3_21","DOI":"10.1007\/978-3-642-29740-3_21"},{"key":"bibr150-1094342016665471","unstructured":"Top500 (2015) http:\/\/www.top500.org\/lists\/2015\/11\/ (accessed 20 August 2016)."},{"key":"bibr151-1094342016665471","doi-asserted-by":"crossref","unstructured":"Totoni E, Jain N, Kale L (2014a) Power management of extreme-scale networks with on\/off links in runtime systems. ACM Transactions on Parallel Computing - Special Issue on PPoPP 2012 1(2): 386\u2013393. DOI: http:\/\/dx.doi.org\/10.1145\/2687001","DOI":"10.1145\/2687001"},{"key":"bibr213-1094342016665471","doi-asserted-by":"crossref","unstructured":"Totoni E, Torrellas J, Kale L (2014b) Using an adaptive HPC runtime system to reconfigure the cache hierarchy. In: Proceedings of the 2014 international conference for high performance computing, networking, storage and analysis (SC'14), New Orleans, LA, 16\u201321 November 2014, pp. 1047\u20131058. ACM Press. DOI: http:\/\/dx.doi.org\/10.1109\/SC.2014.90","DOI":"10.1109\/SC.2014.90"},{"key":"bibr214-1094342016665471","first-page":"24","volume-title":"Transition of HPC Towards Exascale Computing","author":"Vaidyanathan K","year":"2013"},{"key":"bibr152-1094342016665471","doi-asserted-by":"crossref","unstructured":"Venkatesh A, Kandalla K, Panda DK (2013) Evaluation of energy characteristics of MPI communication primitives with RAPL. In: Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), Boston, Massachusetts, 20\u201324 May 2013, pp. 938\u2013945. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/IPDPSW.2013.243","DOI":"10.1109\/IPDPSW.2013.243"},{"key":"bibr153-1094342016665471","doi-asserted-by":"crossref","unstructured":"Venkatesh A, Vishnu A, Hamidouche K, (2015) A case for application-oblivious energy-efficient MPI runtime. In: Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201915), Austin, TX, 15\u201320 November 2015. New York, NY: ACM Press. DOI: http:\/\/dx.doi.org\/10.1145\/2807591.2807658","DOI":"10.1145\/2807591.2807658"},{"key":"bibr215-1094342016665471","doi-asserted-by":"crossref","unstructured":"Vishnu A, Song S, Marquez A, (2013) Designing energy efficient communication runtime systems: a view from PGAS models. The Journal of Supercomputing 63(3): 691\u2013709. DOI: http:\/\/dx.doi.org\/10.1007\/s11227-011-0699-9","DOI":"10.1007\/s11227-011-0699-9"},{"key":"bibr154-1094342016665471","doi-asserted-by":"crossref","unstructured":"Wang G, Ren X (2010) Power-efficient work distribution method for CPU-GPU heterogeneous system. In: Proceedings of 2010 International Symposium on Parallel and Distributed Processing with Applications (ISPA\u201910), Taipei, Taiwan, China, 6\u20139 September 2010, pp. 386\u2013393. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/ISPA.2010.22","DOI":"10.1109\/ISPA.2010.22"},{"key":"bibr216-1094342016665471","doi-asserted-by":"crossref","unstructured":"Ware M, Rajamani K, Floyd M, (2010) Architecting for power management: The IBM\u00ae POWER7\u2122 approach. In: Proceedings of the 16th international symposium on high-performance computer architecture (HPCA'16), Bangalore, India, 9\u201314 January 2010, pp. 1\u201311. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/HPCA.2010.5416627","DOI":"10.1109\/HPCA.2010.5416627"},{"key":"bibr155-1094342016665471","volume-title":"Proceedings of the 1st USENIX Conference on Operating Systems Design and Implementation (OSDI\u201994)","author":"Weiser M","year":"1994"},{"key":"bibr156-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1145\/1862648.1862649"},{"key":"bibr157-1094342016665471","volume-title":"High Performance Compilers for Parallel Computing","author":"Wolfe M","year":"1996"},{"key":"bibr158-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2008.494"},{"key":"bibr159-1094342016665471","doi-asserted-by":"crossref","unstructured":"Yoshii K, Iskra K, Gupta R, (2012) Evaluating power monitoring capabilities on IBM Blue Gene\/P and Blue Gene\/Q. In: Proceedings of 2012 IEEE International Conference on Cluster Computing (CLUSTER), Beijing, China, 24\u201328 September 2012, pp. 36\u201344. IEEE Press. DOI: http:\/\/dx.doi.org\/10.1109\/CLUSTER.2012.62","DOI":"10.1109\/CLUSTER.2012.62"},{"key":"bibr160-1094342016665471","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-43779-7_6"},{"key":"bibr220-1094342016665471","doi-asserted-by":"crossref","unstructured":"Zyuban V, Friedrich J, Gonzalez CJ, (2011) Power optimization methodology for the IBM POWER7 microprocessor. IBM Journal of Research and Development 55(3): 7:1\u20137:9. DOI: http:\/\/dx.doi.org\/10.1147\/JRD.2011.2110410.","DOI":"10.1147\/JRD.2011.2110410"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342016665471","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342016665471","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342016665471","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T04:07:23Z","timestamp":1740974843000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342016665471"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,9,9]]},"references-count":173,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2017,11]]}},"alternative-id":["10.1177\/1094342016665471"],"URL":"https:\/\/doi.org\/10.1177\/1094342016665471","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,9,9]]}}}