{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T05:20:23Z","timestamp":1755926423800,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":45,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,2,23]],"date-time":"2020-02-23T00:00:00Z","timestamp":1582416000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Hong Kong RGC","award":["HKBU 12200418"],"award-info":[{"award-number":["HKBU 12200418"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,2,23]]},"DOI":"10.1145\/3366428.3380767","type":"proceedings-article","created":{"date-parts":[[2020,2,19]],"date-time":"2020-02-19T22:50:35Z","timestamp":1582152635000},"page":"31-40","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["GPGPU performance estimation for frequency scaling using cross-benchmarking"],"prefix":"10.1145","author":[{"given":"Qiang","family":"Wang","sequence":"first","affiliation":[{"name":"Hong Kong Baptist University"}]},{"given":"Chengjian","family":"Liu","sequence":"additional","affiliation":[{"name":"Shenzhen Technology University"}]},{"given":"Xiaowen","family":"Chu","sequence":"additional","affiliation":[{"name":"Hong Kong Baptist University"}]}],"member":"320","published-online":{"date-parts":[[2020,2,23]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.23"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2019.2904497"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2962131"},{"key":"e_1_3_2_1_4_1","first-page":"1484","article-title":"G-CRS: GPU accelerated Cauchy Reed-Solomon coding","volume":"29","author":"Liu X. Chu C.","year":"2018","unstructured":"X. Chu C. Liu , Q. Wang and Y.W. Leung . 2018 . G-CRS: GPU accelerated Cauchy Reed-Solomon coding . IEEE TPDS 29 , 7 (2018), 1484 -- 1498 . X. Chu C. Liu, Q. Wang and Y.W. Leung. 2018. G-CRS: GPU accelerated Cauchy Reed-Solomon coding. IEEE TPDS 29, 7 (2018), 1484--1498.","journal-title":"IEEE TPDS"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Vincent Chau Xiaowen Chu Hai Liu and Yiu-Wing Leung. 2017. Energy Efficient Job Scheduling with DVFS for CPU-GPU Heterogeneous Systems. In ACM e-Energy'17.  Vincent Chau Xiaowen Chu Hai Liu and Yiu-Wing Leung. 2017. Energy Efficient Job Scheduling with DVFS for CPU-GPU Heterogeneous Systems. In ACM e-Energy'17.","DOI":"10.1145\/3077839.3077855"},{"key":"e_1_3_2_1_6_1","volume-title":"IEEE International Symposium on. IEEE, 44--54","author":"Che Shuai","year":"2009","unstructured":"Shuai Che , Michael Boyer , Jiayuan Meng , David Tarjan , Jeremy W Sheaffer , Sang-Ha Lee , and Kevin Skadron . 2009 . Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization (IISWC) 2009 . IEEE International Symposium on. IEEE, 44--54 . Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization (IISWC) 2009. IEEE International Symposium on. IEEE, 44--54."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939785"},{"key":"e_1_3_2_1_8_1","unstructured":"Xiaowen Chu Kaiyong Zhao and Mea Wang. 2009. Practical random linear network coding on GPUs. In 2009 IFIP Networking.  Xiaowen Chu Kaiyong Zhao and Mea Wang. 2009. Practical random linear network coding on GPUs. In 2009 IFIP Networking."},{"key":"e_1_3_2_1_9_1","volume-title":"Proceedings of the 30th international conference on machine learning. 1337--1345","author":"Coates Adam","year":"2013","unstructured":"Adam Coates , Brody Huval , Tao Wang , David Wu , Bryan Catanzaro , and Ng Andrew . 2013 . Deep learning with COTS HPC systems . In Proceedings of the 30th international conference on machine learning. 1337--1345 . Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, and Ng Andrew. 2013. Deep learning with COTS HPC systems. In Proceedings of the 30th international conference on machine learning. 1337--1345."},{"key":"e_1_3_2_1_10_1","first-page":"1800","article-title":"A performance model for GPUs with caches","volume":"26","author":"Dao Thanh Tuan","year":"2015","unstructured":"Thanh Tuan Dao , Jungwon Kim , Sangmin Seo , Bernhard Egger , and Jaejin Lee . 2015 . A performance model for GPUs with caches . IEEE TPDS 26 , 7 (2015), 1800 -- 1813 . Thanh Tuan Dao, Jungwon Kim, Sangmin Seo, Bernhard Egger, and Jaejin Lee. 2015. A performance model for GPUs with caches. IEEE TPDS 26, 7 (2015), 1800--1813.","journal-title":"IEEE TPDS"},{"key":"e_1_3_2_1_11_1","volume-title":"Ng","author":"Dean Jeffrey","year":"2012","unstructured":"Jeffrey Dean , Greg Corrado , Rajat Monga , Kai Chen , Matthieu Devin , Mark Mao , Andrew Senior , Paul Tucker , Ke Yang , Quoc V Le , and Andrew Y . Ng . 2012 . Large scale distributed deep networks. In Advances in Neural Information Processing Systems . 1223--1231. Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems. 1223--1231."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3337821.3337833"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"J. Guerreiro A. Ilic N. Roma and P. Tomas. 2018. GPGPU Power Modeling for Multi-domain Voltage-Frequency Scaling. In 2018 IEEE HPCA. 789--800.  J. Guerreiro A. Ilic N. Roma and P. Tomas. 2018. GPGPU Power Modeling for Multi-domain Voltage-Frequency Scaling. In 2018 IEEE HPCA. 789--800.","DOI":"10.1109\/HPCA.2018.00072"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2018.02.001"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555775"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Y. Huang B. Guo and Y. Shen. 2019. GPU Energy Consumption Optimization With a Global-Based Neural Network Method. IEEE Access 7 (2019).  Y. Huang B. Guo and Y. Shen. 2019. GPU Energy Consumption Optimization With a Global-Based Neural Network Method. IEEE Access 7 (2019).","DOI":"10.1109\/ACCESS.2019.2915380"},{"key":"e_1_3_2_1_17_1","volume-title":"Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. arXiv preprint arXiv:1804.06826","author":"Jia Zhe","year":"2018","unstructured":"Zhe Jia , Marco Maggioni , Benjamin Staiger , and Daniele P Scarpazza . 2018. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. arXiv preprint arXiv:1804.06826 ( 2018 ). Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele P Scarpazza. 2018. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. arXiv preprint arXiv:1804.06826 (2018)."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2015.7054182"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CPSNA.2015.23"},{"volume-title":"Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage\/Frequency and Core Scaling. In 2011 International Conference on PACT. 111--120","author":"Lee Jungseob","key":"e_1_3_2_1_20_1","unstructured":"Jungseob Lee , Vijay Sathisha , Michael Schulte , Katherine Compton , and Nam Sung Kim .2011. Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage\/Frequency and Core Scaling. In 2011 International Conference on PACT. 111--120 . Jungseob Lee, Vijay Sathisha, Michael Schulte, Katherine Compton, and Nam Sung Kim.2011. Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage\/Frequency and Core Scaling. In 2011 International Conference on PACT. 111--120."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"crossref","unstructured":"You Li Kaiyong Zhao Xiaowen Chu and Jiming Liu. 2010. Speeding up k-means algorithm by GPUs. In 2010 IEEE CIT. 115--122.  You Li Kaiyong Zhao Xiaowen Chu and Jiming Liu. 2010. Speeding up k-means algorithm by GPUs. In 2010 IEEE CIT. 115--122.","DOI":"10.1109\/CIT.2010.60"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bts061"},{"key":"e_1_3_2_1_23_1","unstructured":"Xiaohan Ma Mian Dong Lin Zhong and Zhigang Deng. 2009. Statistical power consumption analysis and modeling for GPU-based computing. In ACM Hot-Power'09.  Xiaohan Ma Mian Dong Lin Zhong and Zhigang Deng. 2009. Statistical power consumption analysis and modeling for GPU-based computing. In ACM Hot-Power'09."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/HOTCHIPS.2015.7477461"},{"key":"e_1_3_2_1_25_1","first-page":"1","article-title":"Dissecting GPU Memory Hierarchy Through Microbenchmarking","volume":"28","author":"Mei Xinxin","year":"2017","unstructured":"Xinxin Mei and Xiaowen Chu . 2017 . Dissecting GPU Memory Hierarchy Through Microbenchmarking . IEEE TPDS 28 , 1 (Jan 2017), 72--86. Xinxin Mei and Xiaowen Chu. 2017. Dissecting GPU Memory Hierarchy Through Microbenchmarking. IEEE TPDS 28, 1 (Jan 2017), 72--86.","journal-title":"IEEE TPDS"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2017.8057205"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dcan.2016.10.001"},{"volume-title":"Network and Parallel Computing","author":"Mei Xinxin","key":"e_1_3_2_1_28_1","unstructured":"Xinxin Mei , Kaiyong Zhao , Chengjian Liu , and Xiaowen Chu . 2014. Benchmarking the memory hierarchy of modern GPUs . In Network and Parallel Computing . Springer , 144--156. Xinxin Mei, Kaiyong Zhao, Chengjian Liu, and Xiaowen Chu. 2014. Benchmarking the memory hierarchy of modern GPUs. In Network and Parallel Computing. Springer, 144--156."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/GREENCOMP.2010.5598315"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830826"},{"key":"e_1_3_2_1_31_1","unstructured":"NVIDIA. 2018. CUDA C Programming Guide. [Online] http:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html.  NVIDIA. 2018. CUDA C Programming Guide. [Online] http:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html."},{"key":"e_1_3_2_1_32_1","unstructured":"NVIDIA. 2018. GPU Computing SDK. [Online] https:\/\/developer.nvidia.com\/gpu-computing-sdk.  NVIDIA. 2018. GPU Computing SDK. [Online] https:\/\/developer.nvidia.com\/gpu-computing-sdk."},{"key":"e_1_3_2_1_33_1","unstructured":"NVIDIA. 2018. NVIDIA Management Library. [Online] https:\/\/developer.nvidia.com\/nvidia-management-library-nvml.  NVIDIA. 2018. NVIDIA Management Library. [Online] https:\/\/developer.nvidia.com\/nvidia-management-library-nvml."},{"key":"e_1_3_2_1_34_1","unstructured":"NVIDIA. 2018. NVIDIA Profiler. [Online] http:\/\/docs.nvidia.com\/cuda\/profiler-users-guide.  NVIDIA. 2018. NVIDIA Profiler. [Online] http:\/\/docs.nvidia.com\/cuda\/profiler-users-guide."},{"key":"e_1_3_2_1_35_1","unstructured":"S. Shi Q. Wang and X. Chu. 2018. Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs. In 2018 IEEE DataCom. 949--957.  S. Shi Q. Wang and X. Chu. 2018. Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs. In 2018 IEEE DataCom. 949--957."},{"volume-title":"Benchmarking State-of-the-Art Deep Learning Software Tools. In 2016 7th International Conference on Cloud Computing and Big Data (CCBD). 99--104","author":"Shi S.","key":"e_1_3_2_1_36_1","unstructured":"S. Shi , Q. Wang , P. Xu , and X. Chu . 2016 . Benchmarking State-of-the-Art Deep Learning Software Tools. In 2016 7th International Conference on Cloud Computing and Big Data (CCBD). 99--104 . S. Shi, Q. Wang, P. Xu, and X. Chu. 2016. Benchmarking State-of-the-Art Deep Learning Software Tools. In 2016 7th International Conference on Cloud Computing and Big Data (CCBD). 99--104."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Shuaiwen Song Chunyi Su Barry Rountree and Kirk W Cameron. 2013. A simplified and accurate model of power-performance efficiency on emergent gpu architectures. In 2013 IEEE IPDPS. 673--686.  Shuaiwen Song Chunyi Su Barry Rountree and Kirk W Cameron. 2013. A simplified and accurate model of power-performance efficiency on emergent gpu architectures. In 2013 IEEE IPDPS. 673--686.","DOI":"10.1109\/IPDPS.2013.73"},{"key":"e_1_3_2_1_38_1","unstructured":"Erich Strohmaier Jack Dongarra Horst Simon Martin Meuer and Hans Meuer. 2018. TOP500. [Online] https:\/\/www.top500.org\/lists\/2019\/11\/.  Erich Strohmaier Jack Dongarra Horst Simon Martin Meuer and Hans Meuer. 2018. TOP500. [Online] https:\/\/www.top500.org\/lists\/2019\/11\/."},{"volume-title":"ACM e-Energy'19.","author":"Tang Zhenheng","key":"e_1_3_2_1_39_1","unstructured":"Zhenheng Tang , Yuxin Wang , Qiang Wang , and Xiaowen Chu . 2019. The Impact of GPU DVFS on the Energy and Performance of Deep Learning: An Empirical Study . In ACM e-Energy'19. Phoenix, AZ, USA , 315--325. Zhenheng Tang, Yuxin Wang, Qiang Wang, and Xiaowen Chu. 2019. The Impact of GPU DVFS on the Energy and Performance of Deep Learning: An Empirical Study. In ACM e-Energy'19. Phoenix, AZ, USA, 315--325."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"crossref","unstructured":"Qiang Wang and Xiaowen Chu. 2018. GPGPU Performance Estimation with Core and Memory Frequency Scaling. In 2018 IEEE ICPADS.  Qiang Wang and Xiaowen Chu. 2018. GPGPU Performance Estimation with Core and Memory Frequency Scaling. In 2018 IEEE ICPADS.","DOI":"10.1109\/PADSW.2018.8645000"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"crossref","unstructured":"X. Wang K. Huang A. Knoll and X. Qian. 2019. A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation. In 2019 IEEE HPCA. 506--518.  X. Wang K. Huang A. Knoll and X. Qian. 2019. A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation. In 2019 IEEE HPCA. 506--518.","DOI":"10.1109\/HPCA.2019.00062"},{"key":"e_1_3_2_1_42_1","unstructured":"Yuxin Wang Qiang Wang Shaohuai Shi Xin He Zhenheng Tang Kaiyong Zhao and Xiaowen Chu. 2019. Benchmarking the Performance and Power of AI Accelerators for AI Training. arXiv:cs.DC\/1909.06842  Yuxin Wang Qiang Wang Shaohuai Shi Xin He Zhenheng Tang Kaiyong Zhao and Xiaowen Chu. 2019. Benchmarking the Performance and Power of AI Accelerators for AI Training. arXiv:cs.DC\/1909.06842"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2010.5452013"},{"volume-title":"GPGPU performance and power estimation using machine learning. In 2015 IEEE HPCA","author":"Wu Gene","key":"e_1_3_2_1_44_1","unstructured":"Gene Wu , Joseph L Greathouse , Alexander Lyashevsky , Nuwan Jayasena , and Derek Chiou . 2015. GPGPU performance and power estimation using machine learning. In 2015 IEEE HPCA . IEEE , 564--576. Gene Wu, Joseph L Greathouse, Alexander Lyashevsky, Nuwan Jayasena, and Derek Chiou. 2015. GPGPU performance and power estimation using machine learning. In 2015 IEEE HPCA. IEEE, 564--576."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btu047"}],"event":{"name":"PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","sponsor":["SIGPLAN ACM Special Interest Group on Programming Languages","SIGHPC ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing"],"location":"San Diego California","acronym":"PPoPP '20"},"container-title":["Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3366428.3380767","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3366428.3380767","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:32:53Z","timestamp":1750199573000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3366428.3380767"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,23]]},"references-count":45,"alternative-id":["10.1145\/3366428.3380767","10.1145\/3366428"],"URL":"https:\/\/doi.org\/10.1145\/3366428.3380767","relation":{},"subject":[],"published":{"date-parts":[[2020,2,23]]},"assertion":[{"value":"2020-02-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}