{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,18]],"date-time":"2026-06-18T15:46:05Z","timestamp":1781797565347,"version":"3.54.5"},"reference-count":108,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2019,2,27]],"date-time":"2019-02-27T00:00:00Z","timestamp":1551225600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001807","name":"FAPESP","doi-asserted-by":"crossref","award":["2016\/18929-4"],"award-info":[{"award-number":["2016\/18929-4"]}],"id":[{"id":"10.13039\/501100001807","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2019,3,31]]},"abstract":"<jats:p>Graphics Processing Units (GPUs) are widely used as the accelerator of choice for applications with massively data-parallel tasks. However, recent studies show that GPUs suffer heavily from resource underutilization, which, combined with their large static power consumption, imposes a significant power overhead. One of the most power-hungry components of a GPU\u2014the execution units\u2014frequently experience idleness when (1) an underutilized warp is issued to the execution units, leading to partial lane idleness, and (2) there is no active warp to be issued for the execution due to warp stalls (e.g., waiting for memory access and synchronization). Although large in total, the idle time of execution units actually comes from short but frequent stalls, leaving little potential for common power saving techniques, such as power-gating.<\/jats:p>\n          <jats:p>\n            In this article, we propose\n            <jats:italic>ITAP<\/jats:italic>\n            , a novel idle-time-aware\n            <jats:italic>power<\/jats:italic>\n            management technique, which aims to effectively reduce the static energy consumption of GPU execution units. By taking advantage of different power management techniques (i.e., power-gating and different levels of voltage scaling),\n            <jats:italic>ITAP<\/jats:italic>\n            employs three static power reduction modes with different overheads and capabilities of static power reduction.\n            <jats:italic>ITAP<\/jats:italic>\n            estimates the idle period length of execution units using prediction and peek-ahead techniques in a synergistic way and then applies the most appropriate static power reduction mode based on the estimated idle period length. We design\n            <jats:italic>ITAP<\/jats:italic>\n            to be power-aggressive or performance-aggressive, not both at the same time. Our experimental results on several workloads show that the power-aggressive design of\n            <jats:italic>ITAP<\/jats:italic>\n            outperforms the state-of-the-art solution by an average of 27.6% in terms of static energy savings, with less than 2.1% performance overhead. However, the performance-aggressive design of\n            <jats:italic>ITAP<\/jats:italic>\n            improves the static energy savings by an average of 16.9%, while keeping the GPU performance almost unaffected (i.e., up to 0.4% performance overhead) compared to the state-of-the-art static energy savings mechanism.\n          <\/jats:p>","DOI":"10.1145\/3291606","type":"journal-article","created":{"date-parts":[[2019,2,27]],"date-time":"2019-02-27T15:00:28Z","timestamp":1551279628000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":24,"title":["ITAP"],"prefix":"10.1145","volume":"16","author":[{"given":"Mohammad","family":"Sadrosadati","sequence":"first","affiliation":[{"name":"Sharif University of Technology, Tehran, Iran"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Seyed Borna","family":"Ehsani","sequence":"additional","affiliation":[{"name":"Sharif University of Technology, Tehran, Iran"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hajar","family":"Falahati","sequence":"additional","affiliation":[{"name":"IPM"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rachata","family":"Ausavarungnirun","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, KMUTNB"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Arash","family":"Tavakkol","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mojtaba","family":"Abaee","sequence":"additional","affiliation":[{"name":"IPM"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lois","family":"Orosa","sequence":"additional","affiliation":[{"name":"University of Campinas, ETH Z\u00fcrich"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yaohua","family":"Wang","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, National University of Defense Technology"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hamid","family":"Sarbazi-Azad","sequence":"additional","affiliation":[{"name":"Sharif University of Technology, IPM"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Onur","family":"Mutlu","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, Carnegie Mellon University"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2019,2,27]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2013.6522337"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2540708.2540719"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2934583.2934606"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/2755753.2757164"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2015.26"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2014.6835953"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of HPCA","author":"Arora Manish","year":"2015"},{"key":"e_1_2_1_8_1","unstructured":"Rachata Ausavarungnirun. 2017. Techniques for Shared Resource Management in Systems With Throughput Processors. Ph.D. Dissertation. Carnegie Mellon University Pittsburgh PA.  Rachata Ausavarungnirun. 2017. Techniques for Shared Resource Management in Systems With Throughput Processors. Ph.D. Dissertation. Carnegie Mellon University Pittsburgh PA."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/2337159.2337207"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2015.38"},{"key":"e_1_2_1_11_1","unstructured":"Rachata Ausavarungnirun Saugata Ghose Onur Kay\u0131ran Gabriel H. Loh Chita R. Das Mahmut T. Kandemir and Onur Mutlu. 2018. Holistic management of the GPGPU memory hierarchy to manage warp-level latency tolerance. arXiv:1804.11038.  Rachata Ausavarungnirun Saugata Ghose Onur Kay\u0131ran Gabriel H. Loh Chita R. Das Mahmut T. Kandemir and Onur Mutlu. 2018. Holistic management of the GPGPU memory hierarchy to manage warp-level latency tolerance. arXiv:1804.11038."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3123975"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173162.3173169"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of ISPASS","author":"Bakhoda Ali","year":"2009"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2012.124"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2896377.2901453"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078505.3078590"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2012.33"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of HPCA","author":"Chen Lizhong","year":"2014"},{"key":"e_1_2_1_21_1","unstructured":"Pran Kurup and Taher Abbasi. 2011. Logic Synthesis Using Synopsys (2nd Edition). Springer Publishing Company Incorporated.   Pran Kurup and Taher Abbasi. 2011. Logic Synthesis Using Synopsys (2nd Edition). Springer Publishing Company Incorporated."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1998582.1998590"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1950365.1950392"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/545215.545232"},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Denis Foley Pankaj Bansal Don Cherepacha Robert Wasmuth Aswin Gunasekar Srinivasa Gutta etal 2011. A low-power integrated x86-64 and graphics processor for mobile computing devices. In Proceeding os ISSCC 2011.  Denis Foley Pankaj Bansal Don Cherepacha Robert Wasmuth Aswin Gunasekar Srinivasa Gutta et al. 2011. A low-power integrated x86-64 and graphics processor for mobile computing devices. In Proceeding os ISSCC 2011.","DOI":"10.1109\/JSSC.2011.2167776"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of HPCA","author":"Wilson W.","year":"2011"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2007.12"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000064.2000093"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2540708.2540716"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2013.6522330"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446104"},{"key":"e_1_2_1_32_1","unstructured":"David Hodges Horace Jackson and Resve Saleh. 2004. Analysis and Design of Digital Integrated Circuits in Deep Submicron Technology. McGraw-Hill.   David Hodges Horace Jackson and Resve Saleh. 2004. Analysis and Design of Digital Integrated Circuits in Deep Submicron Technology. McGraw-Hill."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815998"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1013235.1013249"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2005.93"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2012.13"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.5555\/2738600.2738602"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195655"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2451116.2451158"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2896377.2901468"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.5555\/2485288.2485384"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2013.6657030"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2228360.2228572"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062313"},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of PACT","author":"Kay\u0131ran Onur","year":"2013"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2967938.2967941"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.62"},{"key":"e_1_2_1_48_1","unstructured":"Pierre Bricaud. 2012. Reuse Methodology Manual: For System-on-a-chip Designs. Springer Science and Business Media.  Pierre Bricaud. 2012. Reuse Methodology Manual: For System-on-a-chip Designs. Springer Science and Business Media."},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of ITC","author":"Keshavarzi Ali","year":"1997"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00038"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00073"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830796"},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of IPDPS","author":"Khorasani Farzad","year":"2016"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024724.2024932"},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of MICRO","author":"Kim Nam Sung","year":"2002"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3123974"},{"key":"e_1_2_1_57_1","volume-title":"Retrieved","author":"Knudsen Jesper","year":"2008"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2013.6657064"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628107"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485964"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPPW.2011.25"},{"key":"e_1_2_1_62_1","volume-title":"Proceedings of HPCA","author":"Lia H.","year":"2003"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.51"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/1594233.1594331"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/279358.279377"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/NOCS.2010.16"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.5555\/1397757.1397983"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3294049"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130218.3130222"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669151"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155656"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2018.2873679"},{"key":"e_1_2_1_73_1","volume-title":"Retrieved","author":"NVIDIA.","year":"2008"},{"key":"e_1_2_1_74_1","volume-title":"Whitepaper: NVIDIA\u2019s Next Generation CUDA<sup>TM<\/sup> Compute Architecture: Fermi<sup>TM<\/sup>. Technical Report. NVIDIA.","author":"NVIDIA.","year":"2009"},{"key":"e_1_2_1_75_1","volume-title":"Retrieved","author":"NVIDIA.","year":"2016"},{"key":"e_1_2_1_76_1","volume-title":"White Paper: NVIDIA Tesla P100. Technical Report. NVIDIA.","author":"NVIDIA.","year":"2016"},{"key":"e_1_2_1_77_1","volume-title":"Retrieved","author":"NVIDIA.","year":"2018"},{"key":"e_1_2_1_78_1","volume-title":"Retrieved","author":"NVIDIA.","year":"2018"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2014.6974712"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/2593069.2593151"},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2015.2430853"},{"key":"e_1_2_1_82_1","volume-title":"Proceedings of HPCA","author":"Pekhimenko Gennady","year":"2016"},{"key":"e_1_2_1_83_1","volume-title":"Proceedings of DATE","author":"Rahimi Abbas","year":"2016"},{"key":"e_1_2_1_84_1","volume-title":"Proceedings of DATE","author":"Rahimi Abbas","year":"2015"},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485953"},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339668"},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2002.808156"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISLPED.2015.7273522"},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173162.3173211"},{"key":"e_1_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.5555\/3130379.3130387"},{"key":"e_1_2_1_91_1","doi-asserted-by":"publisher","DOI":"10.1145\/2593069.2593086"},{"key":"e_1_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2013.6522345"},{"key":"e_1_2_1_93_1","volume-title":"Proceedings of PACT","author":"Seething Ankit","year":"2010"},{"key":"e_1_2_1_94_1","unstructured":"Hynix Semiconductor. 2009. Hynix GDDR5 SGRAM Part H5GQ1H24AFR Revision 1.0. http:\/\/www.hynix.com\/datasheet\/pdf\/graphics\/H5GQ1H24AFR(Rev1.0).pdf.  Hynix Semiconductor. 2009. Hynix GDDR5 SGRAM Part H5GQ1H24AFR Revision 1.0. http:\/\/www.hynix.com\/datasheet\/pdf\/graphics\/H5GQ1H24AFR(Rev1.0).pdf."},{"key":"e_1_2_1_95_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.16"},{"key":"e_1_2_1_96_1","volume-title":"Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing. Technical Report","author":"Stratton John A.","year":"2012"},{"key":"e_1_2_1_97_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485954"},{"key":"e_1_2_1_98_1","doi-asserted-by":"publisher","DOI":"10.5555\/2665671.2665710"},{"key":"e_1_2_1_99_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00074"},{"key":"e_1_2_1_100_1","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195656"},{"key":"e_1_2_1_101_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750399"},{"key":"e_1_2_1_102_1","doi-asserted-by":"publisher","DOI":"10.1109\/L-CA.2009.1"},{"key":"e_1_2_1_103_1","doi-asserted-by":"publisher","DOI":"10.1145\/2019608.2019612"},{"key":"e_1_2_1_104_1","volume-title":"Proceedings of DATE","author":"Wang Yu","year":"2012"},{"key":"e_1_2_1_105_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628105"},{"key":"e_1_2_1_106_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.40"},{"key":"e_1_2_1_107_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000064.2000094"},{"issue":"5","key":"e_1_2_1_108_1","first-page":"630","article-title":"Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order","author":"Zuravleff William K.","year":"1997","journal-title":"Patent"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3291606","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3291606","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:02:06Z","timestamp":1750208526000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3291606"}},"subtitle":["Idle-Time-Aware Power Management for GPU Execution Units"],"short-title":[],"issued":{"date-parts":[[2019,2,27]]},"references-count":108,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2019,3,31]]}},"alternative-id":["10.1145\/3291606"],"URL":"https:\/\/doi.org\/10.1145\/3291606","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,2,27]]},"assertion":[{"value":"2018-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-02-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}