{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:10:32Z","timestamp":1750306232221,"version":"3.41.0"},"reference-count":40,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2017,4,21]],"date-time":"2017-04-21T00:00:00Z","timestamp":1492732800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CAREER-1253024, CCF-1318826, CNS-1421022 and CNS-1421068"],"award-info":[{"award-number":["CAREER-1253024, CCF-1318826, CNS-1421022 and CNS-1421068"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2017,7,31]]},"abstract":"<jats:p>The emergence of GPGPU applications, bolstered by flexible GPU programming platforms, has created a tremendous challenge in maintaining high energy efficiency in modern GPUs. In this article, we demonstrate that customizing a Streaming Multiprocessor (SM) of a GPU at a lower frequency is significantly more energy efficient compared to employing DVFS on an SM designed for a high-frequency operation. Using a system-level CAD technique, we propose<jats:italic>SSAGA\u2014Streaming Multiprocessors Synthesized for Asymmetric GPGPU Applications<\/jats:italic>\u2014an energy-efficient GPU design paradigm. SSAGA creates architecturally identical SM cores, customized for different voltage-frequency domains. Our rigorous cross-layer methodology demonstrates an average of 20% improvement in energy efficiency over a spatially multitasking GPU across a range of GPGPU applications.<\/jats:p>","DOI":"10.1145\/3014163","type":"journal-article","created":{"date-parts":[[2017,4,21]],"date-time":"2017-04-21T12:51:10Z","timestamp":1492779070000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["SSAGA"],"prefix":"10.1145","volume":"22","author":[{"given":"Shamik","family":"Saha","sequence":"first","affiliation":[{"name":"USU BRIDGE Lab, Electrical and Computer Engineering, Utah State University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Prabal","family":"Basu","sequence":"additional","affiliation":[{"name":"USU BRIDGE Lab, Electrical and Computer Engineering, Utah State University, Logan, UT"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chidhambaranathan","family":"Rajamanikkam","sequence":"additional","affiliation":[{"name":"USU BRIDGE Lab, Electrical and Computer Engineering, Utah State University, Logan, UT"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aatreyi","family":"Bal","sequence":"additional","affiliation":[{"name":"USU BRIDGE Lab, Electrical and Computer Engineering, Utah State University, Logan, UT"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Koushik","family":"Chakraborty","sequence":"additional","affiliation":[{"name":"USU BRIDGE Lab, Electrical and Computer Engineering, Utah State University, Logan, UT"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sanghamitra","family":"Roy","sequence":"additional","affiliation":[{"name":"USU BRIDGE Lab, Electrical and Computer Engineering, Utah State University, Logan, UT"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2017,4,21]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Advanced Micro Devices (AMD). 2016. AMD Accelerated Parallel Processing (APP) Software Development Kit. http:\/\/developer.amd.com\/sdks\/amdappsdk\/. Advanced Micro Devices (AMD). 2016. AMD Accelerated Parallel Processing (APP) Software Development Kit. http:\/\/developer.amd.com\/sdks\/amdappsdk\/."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2012.6168946"},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"P. Aguilera J. Lee A. F. Farahani K. Morrow M. J. Schulte and N. S. Kim. 2014a. Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking. In IEEE\/ACM Design Automation 8 Test in Europe (DATE\u201914). 1--6. P. Aguilera J. Lee A. F. Farahani K. Morrow M. J. Schulte and N. S. Kim. 2014a. Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking. In IEEE\/ACM Design Automation 8 Test in Europe (DATE\u201914). 1--6.","DOI":"10.7873\/DATE.2014.189"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASPDAC.2014.6742976"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2008.917719"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1840845.1840898"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2593069.2593117"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/DATE.2011.5763030"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1629911.1630120"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2013.98"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2003.810058"},{"volume-title":"Proceedings of the 6th Annual Workshop on Duplicating, Deconstructing, and Debanking (WDDD\u201907)","author":"Huang W.","key":"e_1_2_1_12_1"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/996566.996800"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2015.7054182"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/GreenCom-CPSCom.2010.143"},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"J. Lee V. Sathisha M. Schulte K. Compton and N. S. Kim. 2011. Improving throughput of power-constrained GPUs using dynamic voltage\/frequency and core scaling. In Parallel Architectures and Compilation Techniques (PACT\u201911). 111--120. J. Lee V. Sathisha M. Schulte K. Compton and N. S. Kim. 2011. Improving throughput of power-constrained GPUs using dynamic voltage\/frequency and core scaling. In Parallel Architectures and Compilation Techniques (PACT\u201911). 111--120.","DOI":"10.1109\/PACT.2011.17"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCE.2014.6776031"},{"volume-title":"Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (USENIXATC\u201911)","author":"Kato S.","key":"e_1_2_1_18_1"},{"key":"e_1_2_1_19_1","doi-asserted-by":"crossref","unstructured":"M. Kim K. Kim J. Geraci and S. Hong. 2014. Utilization-aware load balancing for the energy efficient operation of the big.LITTLE processor. In IEEE\/ACM Design Automation 8 Test in Europe (DATE\u201914). 223:1--223:4. M. Kim K. Kim J. Geraci and S. Hong. 2014. Utilization-aware load balancing for the energy efficient operation of the big.LITTLE processor. In IEEE\/ACM Design Automation 8 Test in Europe (DATE\u201914). 223:1--223:4.","DOI":"10.7873\/DATE2014.236"},{"volume-title":"Proceedings of High Performance Computer Architecture (HPCA\u201908)","author":"Kim W.","key":"e_1_2_1_20_1"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2013.6657064"},{"volume-title":"Microarchitecture","year":"2003","author":"Kumar R.","key":"e_1_2_1_22_1"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1629911.1629926"},{"volume-title":"Proceedings of High Performance Computer Architecture (HPCA\u201910)","author":"Li T.","key":"e_1_2_1_24_1"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2014.2313342"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2013.6557150"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/DATE.2003.1253581"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2525526.2525852"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2553062.2553067"},{"key":"e_1_2_1_30_1","unstructured":"M. Shebanow. 2013. An evolution of mobile graphics. Keynote Talk at High Performance Graphics. M. Shebanow. 2013. An evolution of mobile graphics. Keynote Talk at High Performance Graphics."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1531793.1531804"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/9.119632"},{"key":"e_1_2_1_33_1","first-page":"482","article-title":"An overview of the simultaneous perturbation method for efficient optimization","volume":"19","author":"Spall J. C.","year":"1998","journal-title":"Johns Hopkins Apl. Technical Digest"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370865"},{"key":"e_1_2_1_35_1","unstructured":"Universidad de Costa Rica. 2009. Theia GPU. http:\/\/opencores.org\/project theia_gpu. Universidad de Costa Rica. 2009. Theia GPU. http:\/\/opencores.org\/project theia_gpu."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2019608.2019612"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISVLSI.2014.34"},{"volume-title":"Proceedings of High Performance Computer Architecture (HPCA\u201916)","author":"Wang Z.","key":"e_1_2_1_38_1"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001161"},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","unstructured":"D. You and K.-S. Chung. 2014. Quality of service-aware dynamic voltage and frequency scaling for embedded GPUs. In IEEE Computer Architecture Letters. 66--69. D. You and K.-S. Chung. 2014. Quality of service-aware dynamic voltage and frequency scaling for embedded GPUs. In IEEE Computer Architecture Letters. 66--69.","DOI":"10.1109\/LCA.2014.2319079"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3014163","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3014163","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3014163","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:23:16Z","timestamp":1750220596000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3014163"}},"subtitle":["SMs Synthesized for Asymmetric GPGPU Applications"],"short-title":[],"issued":{"date-parts":[[2017,4,21]]},"references-count":40,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2017,7,31]]}},"alternative-id":["10.1145\/3014163"],"URL":"https:\/\/doi.org\/10.1145\/3014163","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"type":"print","value":"1084-4309"},{"type":"electronic","value":"1557-7309"}],"subject":[],"published":{"date-parts":[[2017,4,21]]},"assertion":[{"value":"2016-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-04-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}