{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,5]],"date-time":"2025-03-05T05:41:35Z","timestamp":1741153295668,"version":"3.38.0"},"reference-count":41,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2014,3,21]],"date-time":"2014-03-21T00:00:00Z","timestamp":1395360000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2014,8]]},"abstract":"<jats:p> Graphics processing units (GPUs) have become widely accepted as the computing platform of choice in many high performance computing domains. The availability of programming standards such as OpenCL are used to leverage the inherent parallelism offered by GPUs. Source code optimizations such as loop unrolling and tiling when targeted to heterogeneous applications have reported large gains in performance. However, given the power consumption of GPUs, platforms can exhaust their power budgets quickly. Better solutions are needed to effectively exploit the power-efficiency available on heterogeneous systems. In this work, we evaluate the power\/performance efficiency of different optimizations used on heterogeneous applications. We analyze the power\/performance trade-off by evaluating energy consumption of the optimizations. We compare the performance of different optimization techniques on four different fast Fourier transform implementations. Our study covers discrete GPUs, shared memory GPUs (APUs) and low power system-on-chip (SoC) devices, and includes hardware from AMD (Llano APUs and the Southern Islands GPU), Nvidia (Kepler), Intel (Ivy Bridge) and Qualcomm (Snapdragon S4) as test platforms. The study identifies the architectural and algorithmic factors which can most impact power consumption. We explore a range of application optimizations which show an increase in power consumption by 27%, but result in more than 1.8 \u00d7 increase in speed of performance. We observe up to an 18% reduction in power consumption due to reduced kernel calls across FFT implementations. We also observe an 11% variation in energy consumption among different optimizations. We highlight how different optimizations can improve the execution performance of a heterogeneous application, but also impact the power efficiency of the application. More importantly, we demonstrate that different algorithms implementing the same fundamental function (FFT) can perform with vast differences based on the target hardware and associated application design. <\/jats:p>","DOI":"10.1177\/1094342014526907","type":"journal-article","created":{"date-parts":[[2014,3,22]],"date-time":"2014-03-22T04:16:45Z","timestamp":1395461805000},"page":"319-334","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":6,"title":["Analyzing power efficiency of optimization techniques and algorithm design methods for applications on heterogeneous platforms"],"prefix":"10.1177","volume":"28","author":[{"given":"Yash","family":"Ukidave","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA"}]},{"given":"Amir Kavyan","family":"Ziabari","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA"}]},{"given":"Perhaad","family":"Mistry","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA"}]},{"given":"Gunar","family":"Schirner","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA"}]},{"given":"David","family":"Kaeli","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA"}]}],"member":"179","published-online":{"date-parts":[[2014,3,21]]},"reference":[{"key":"bibr1-1094342014526907","unstructured":"AMD (n.d.) AMD SDK (formerly ATI Stream). Available at: http:\/\/developer.amd.com\/gpu\/AMDAPPSDK\/."},{"key":"bibr2-1094342014526907","unstructured":"AMD (n.d.) clAmdfft, OpenCL FFT library from AMD. Available at: http:\/\/www.bealto.com\/gpu-fft.html."},{"key":"bibr3-1094342014526907","unstructured":"Apple (n.d.) Apple implementation of FFT using OpenCL. Available at: https:\/\/developer.apple.com\/library\/mac\/samplecode\/OpenCL_FFT\/Introduction\/Intro.html"},{"key":"bibr4-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2009.4919648"},{"volume-title":"AMD Fusion Developer Summit","year":"2011","author":"Boudier P","key":"bibr5-1094342014526907"},{"key":"bibr6-1094342014526907","unstructured":"BSquare (n.d.) Qualcomm Snapdragon S4 Pro APQ8064 MDP tablet datasheet. Available at: http:\/\/www.bsquare.com\/Documents\/APQ8064%20MDP%20Tablet.pdf."},{"key":"bibr7-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-01970-8_92"},{"key":"bibr8-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2012.6176876"},{"key":"bibr9-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1145\/1735688.1735702"},{"key":"bibr10-1094342014526907","first-page":"831","volume":"3","author":"Deschizeaux B","year":"2007","journal-title":"GPU Gems"},{"key":"bibr11-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1016\/0165-1684(90)90158-U"},{"key":"bibr12-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840301"},{"volume-title":"Heterogeneous Computing with OpenCL","year":"2011","author":"Gaster B","key":"bibr13-1094342014526907"},{"key":"bibr14-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2008.5213922"},{"key":"bibr15-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1145\/2348543.2348587"},{"key":"bibr16-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1145\/1555815.1555775"},{"key":"bibr17-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1145\/1513895.1513903"},{"key":"bibr18-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2010.107"},{"key":"bibr19-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1109\/SAAHPC.2012.26"},{"key":"bibr20-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5161248"},{"volume-title":"AMD Fusion Developer Summit","year":"2011","author":"Mantor M","key":"bibr21-1094342014526907"},{"key":"bibr22-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1109\/GREENCOMP.2010.5598315"},{"key":"bibr23-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2010.41"},{"key":"bibr24-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2008.5213210"},{"key":"bibr25-1094342014526907","unstructured":"Nvidia (2010) Cufft library. Available at: https:\/\/developer.nvidia.com\/cufft."},{"key":"bibr26-1094342014526907","unstructured":"Nvidia (2012) Whitepaper on NVIDIA GeForce GTX 680. Technical report."},{"key":"bibr27-1094342014526907","unstructured":"Nvidia (n.d.) Nvidia management library (NVML) Available at: http:\/\/developer.nvidia.com\/cuda\/nvidia-management-library-nvml."},{"key":"bibr28-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2008.917757"},{"volume-title":"Programming Techniques For High-Performance Graphics And General-Purpose Computation","year":"2005","author":"Pharr M","key":"bibr29-1094342014526907"},{"key":"bibr30-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1109\/SBAC-PAD.2007.26"},{"key":"bibr31-1094342014526907","unstructured":"Rofouei M, Stathopoulos T, Ryffel S, (2008) Energy-aware high performance computing with graphic processing units. In: Proceedings of the 2008 conference on power aware computing and systems, San Diego, California, USA, December 2008. Available at: https:\/\/www.usenix.org\/conference\/hotpower-08\/energy-aware-high-performance-computing-graphic-processing-units."},{"key":"bibr32-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1145\/1345206.1345220"},{"key":"bibr33-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1145\/2212908.2212924"},{"key":"bibr34-1094342014526907","first-page":"61801","volume":"51","author":"Stone S","year":"2008","journal-title":"Urbana"},{"key":"bibr35-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1109\/PDCAT.2009.65"},{"key":"bibr36-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1145\/2082156.2082159"},{"key":"bibr37-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370865"},{"key":"bibr38-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2013.6557174"},{"key":"bibr39-1094342014526907","unstructured":"Volkov V, Kazian B (2008) Fitting FFT onto the G80 architecture. Available at: http:\/\/www.cs.berkeley.edu\/kubitron\/courses\/cs258-S08\/projects\/reports\/project6_report.pdf."},{"key":"bibr40-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"bibr41-1094342014526907","doi-asserted-by":"publisher","DOI":"10.1145\/337292.337436"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342014526907","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342014526907","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342014526907","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T16:40:00Z","timestamp":1741106400000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342014526907"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,3,21]]},"references-count":41,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2014,8]]}},"alternative-id":["10.1177\/1094342014526907"],"URL":"https:\/\/doi.org\/10.1177\/1094342014526907","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2014,3,21]]}}}