{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,20]],"date-time":"2026-02-20T18:41:51Z","timestamp":1771612911187,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":77,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,10,28]],"date-time":"2023-10-28T00:00:00Z","timestamp":1698451200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nd\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,10,28]]},"DOI":"10.1145\/3613424.3614277","type":"proceedings-article","created":{"date-parts":[[2023,12,8]],"date-time":"2023-12-08T17:22:15Z","timestamp":1702056135000},"page":"380-394","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Path Forward Beyond Simulators: Fast and Accurate GPU Execution Time Prediction for DNN Workloads"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-2737-0583","authenticated-orcid":false,"given":"Ying","family":"Li","sequence":"first","affiliation":[{"name":"William &amp; Mary, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3532-6521","authenticated-orcid":false,"given":"Yifan","family":"Sun","sequence":"additional","affiliation":[{"name":"William &amp; Mary, United States of America"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5525-7204","authenticated-orcid":false,"given":"Adwait","family":"Jog","sequence":"additional","affiliation":[{"name":"University of Virginia, United States of America"}]}],"member":"320","published-online":{"date-parts":[[2023,12,8]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Accessed: 2023. NVIDIA Nsight Compute. https:\/\/developer.nvidia.com\/nsight-compute.  Accessed: 2023. NVIDIA Nsight Compute. https:\/\/developer.nvidia.com\/nsight-compute."},{"key":"e_1_3_2_1_2_1","unstructured":"Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg\u00a0S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dandelion Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https:\/\/www.tensorflow.org\/ Software available from tensorflow.org.  Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg\u00a0S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dandelion Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https:\/\/www.tensorflow.org\/ Software available from tensorflow.org."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2016.7581275"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2009.4919636"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480100"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1693453.1693470"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2009.4919648"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3559009.3569666"},{"key":"e_1_3_2_1_9_1","volume-title":"The gem5 simulator. ACM SIGARCH computer architecture news 39, 2","author":"Binkert Nathan","year":"2011","unstructured":"Nathan Binkert , Bradford Beckmann , Gabriel Black , Steven\u00a0 K Reinhardt , Ali Saidi , Arkaprava Basu , Joel Hestness , Derek\u00a0 R Hower , Tushar Krishna , Somayeh Sardashti , 2011. The gem5 simulator. ACM SIGARCH computer architecture news 39, 2 ( 2011 ), 1\u20137. Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven\u00a0K Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek\u00a0R Hower, Tushar Krishna, Somayeh Sardashti, 2011. The gem5 simulator. ACM SIGARCH computer architecture news 39, 2 (2011), 1\u20137."},{"key":"e_1_3_2_1_10_1","volume-title":"DNNOff: offloading DNN-based intelligent IoT applications in mobile edge computing","author":"Chen Xing","year":"2021","unstructured":"Xing Chen , Ming Li , Hao Zhong , Yun Ma , and Ching-Hsien Hsu . 2021. DNNOff: offloading DNN-based intelligent IoT applications in mobile edge computing . IEEE transactions on industrial informatics 18, 4 ( 2021 ), 2820\u20132829. Xing Chen, Ming Li, Hao Zhong, Yun Ma, and Ching-Hsien Hsu. 2021. DNNOff: offloading DNN-based intelligent IoT applications in mobile edge computing. IEEE transactions on industrial informatics 18, 4 (2021), 2820\u20132829."},{"key":"e_1_3_2_1_11_1","volume-title":"cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759","author":"Chetlur Sharan","year":"2014","unstructured":"Sharan Chetlur , Cliff Woolley , Philippe Vandermersch , Jonathan Cohen , John Tran , Bryan Catanzaro , and Evan Shelhamer . 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 ( 2014 ). Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)."},{"key":"e_1_3_2_1_12_1","volume-title":"cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759","author":"Chetlur Sharan","year":"2014","unstructured":"Sharan Chetlur , Cliff Woolley , Philippe Vandermersch , Jonathan Cohen , John Tran , Bryan Catanzaro , and Evan Shelhamer . 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 ( 2014 ). Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)."},{"key":"e_1_3_2_1_13_1","first-page":"800","article-title":"Tensorflow lite micro: Embedded machine learning for tinyml systems","volume":"3","author":"David Robert","year":"2021","unstructured":"Robert David , Jared Duke , Advait Jain , Vijay Janapa\u00a0Reddi , Nat Jeffries , Jian Li , Nick Kreeger , Ian Nappier , Meghna Natraj , Tiezhen Wang , 2021 . Tensorflow lite micro: Embedded machine learning for tinyml systems . Proceedings of Machine Learning and Systems 3 (2021), 800 \u2013 811 . Robert David, Jared Duke, Advait Jain, Vijay Janapa\u00a0Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Tiezhen Wang, 2021. Tensorflow lite micro: Embedded machine learning for tinyml systems. Proceedings of Machine Learning and Systems 3 (2021), 800\u2013811.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_1_14_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_3_2_1_15_1","unstructured":"Ashraf Eassa. 2018. Why NVIDIA Corp. Will Design a Fully Custom Chip for Future AI Workloads. https:\/\/www.fool.com\/investing\/2018\/04\/26\/why-nvidia-corp-will-design-a-fully-custom-chip-fo.aspx.  Ashraf Eassa. 2018. Why NVIDIA Corp. Will Design a Fully Custom Chip for Future AI Workloads. https:\/\/www.fool.com\/investing\/2018\/04\/26\/why-nvidia-corp-will-design-a-fully-custom-chip-fo.aspx."},{"key":"e_1_3_2_1_17_1","volume-title":"https:\/\/keras.io\/guides\/ [Accessed","year":"2022","unstructured":"Google. 2022. Keras. https:\/\/keras.io\/guides\/ [Accessed October 6, 2022 ]. Google. 2022. Keras. https:\/\/keras.io\/guides\/ [Accessed October 6, 2022]."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2010.5649549"},{"key":"e_1_3_2_1_19_1","volume-title":"Recent advances in convolutional neural networks. Pattern recognition 77","author":"Gu Jiuxiang","year":"2018","unstructured":"Jiuxiang Gu , Zhenhua Wang , Jason Kuen , Lianyang Ma , Amir Shahroudy , Bing Shuai , Ting Liu , Xingxing Wang , Gang Wang , Jianfei Cai , 2018. Recent advances in convolutional neural networks. Pattern recognition 77 ( 2018 ), 354\u2013377. Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Gang Wang, Jianfei Cai, 2018. Recent advances in convolutional neural networks. Pattern recognition 77 (2018), 354\u2013377."},{"key":"e_1_3_2_1_20_1","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Gujarati Arpan","year":"2020","unstructured":"Arpan Gujarati , Reza Karimi , Safya Alzayat , Wei Hao , Antoine Kaufmann , Ymir Vigfusson , and Jonathan Mace . 2020 . Serving { DNNs} like Clockwork: Performance Predictability from the Bottom Up . In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) . 443\u2013462. Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann, Ymir Vigfusson, and Jonathan Mace. 2020. Serving { DNNs} like Clockwork: Performance Predictability from the Bottom Up. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 443\u2013462."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00058"},{"key":"e_1_3_2_1_22_1","unstructured":"Gareth Halfacree. 2017. AMD Signs Semi-Custom AI Chip Deal with Tesla Source Claims. https:\/\/bit-tech.net\/news\/tech\/cpus\/amd-signs-semi-custom-ai-chip-deal-with-tesla-source-claims\/1\/.  Gareth Halfacree. 2017. AMD Signs Semi-Custom AI Chip Deal with Tesla Source Claims. https:\/\/bit-tech.net\/news\/tech\/cpus\/amd-signs-semi-custom-ai-chip-deal-with-tesla-source-claims\/1\/."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3578244.3583736"},{"key":"e_1_3_2_1_24_1","unstructured":"Yueming Hao Xu Zhao Bin Bao David Berard Will Constable Adnan Aziz and Xu Liu. 2023. TorchBench: Benchmarking PyTorch with High API Surface Coverage. arxiv:2304.14226\u00a0[cs.LG]  Yueming Hao Xu Zhao Bin Bao David Berard Will Constable Adnan Aziz and Xu Liu. 2023. TorchBench: Benchmarking PyTorch with High API Surface Coverage. arxiv:2304.14226\u00a0[cs.LG]"},{"key":"e_1_3_2_1_25_1","volume-title":"The State of Machine Learning Frameworks","author":"Horace He.","year":"2019","unstructured":"Horace He. 2019. The State of Machine Learning Frameworks in 2019 . https:\/\/thegradient.pub\/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry\/. The Gradient ( 2019). Horace He. 2019. The State of Machine Learning Frameworks in 2019. https:\/\/thegradient.pub\/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry\/. The Gradient (2019)."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00047"},{"key":"e_1_3_2_1_28_1","unstructured":"Myeongjae Jeon and Shivaram Venkataraman. [n. d.]. Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads.  Myeongjae Jeon and Shivaram Venkataraman. [n. d.]. Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/3488766.3488792"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2018.8622396"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CADS.2013.6714232"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1735688.1735696"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00047"},{"key":"e_1_3_2_1_34_1","volume-title":"MIOpen: An open source library for deep learning primitives. arXiv preprint arXiv:1910.00078","author":"Khan Jehandad","year":"2019","unstructured":"Jehandad Khan , Paul Fultz , Artem Tamazov , Daniel Lowell , Chao Liu , Michael Melesse , Murali Nandhimandalam , Kamil Nasyrov , Ilya Perminov , Tejash Shah , 2019. MIOpen: An open source library for deep learning primitives. arXiv preprint arXiv:1910.00078 ( 2019 ). Jehandad Khan, Paul Fultz, Artem Tamazov, Daniel Lowell, Chao Liu, Michael Melesse, Murali Nandhimandalam, Kamil Nasyrov, Ilya Perminov, Tejash Shah, 2019. MIOpen: An open source library for deep learning primitives. arXiv preprint arXiv:1910.00078 (2019)."},{"key":"e_1_3_2_1_35_1","unstructured":"Jehandad Khan Paul Fultz Artem Tamazov Daniel Lowell Chao Liu Michael Melesse Murali Nandhimandalam Kamil Nasyrov Ilya Perminov Tejash Shah Vasilii Filippov Jing Zhang Jing Zhou Bragadeesh Natarajan and Mayank Daga. 2019. MIOpen: An Open Source Library For Deep Learning Primitives. arxiv:1910.00078\u00a0[cs.LG]  Jehandad Khan Paul Fultz Artem Tamazov Daniel Lowell Chao Liu Michael Melesse Murali Nandhimandalam Kamil Nasyrov Ilya Perminov Tejash Shah Vasilii Filippov Jing Zhang Jing Zhou Bragadeesh Natarajan and Mayank Daga. 2019. MIOpen: An Open Source Library For Deep Learning Primitives. arxiv:1910.00078\u00a0[cs.LG]"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2019.2929165"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.435"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS57527.2023.00047"},{"key":"e_1_3_2_1_39_1","volume-title":"A survey of convolutional neural networks: analysis, applications, and prospects","author":"Li Zewen","year":"2021","unstructured":"Zewen Li , Fan Liu , Wenjie Yang , Shouheng Peng , and Jun Zhou . 2021. A survey of convolutional neural networks: analysis, applications, and prospects . IEEE transactions on neural networks and learning systems ( 2021 ). Zewen Li, Fan Liu, Wenjie Yang, Shouheng Peng, and Jun Zhou. 2021. A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems (2021)."},{"key":"e_1_3_2_1_40_1","volume-title":"PerfNetRT: Platform-Aware Performance Modeling for Optimized Deep Neural Networks. In 2020 International Computer Symposium (ICS). IEEE, 153\u2013158","author":"Liao Ying-Chiao","year":"2020","unstructured":"Ying-Chiao Liao , Chuan-Chi Wang , Chia-Heng Tu , Ming-Chang Kao , Wen-Yew Liang , and Shih-Hao Hung . 2020 . PerfNetRT: Platform-Aware Performance Modeling for Optimized Deep Neural Networks. In 2020 International Computer Symposium (ICS). IEEE, 153\u2013158 . Ying-Chiao Liao, Chuan-Chi Wang, Chia-Heng Tu, Ming-Chang Kao, Wen-Yew Liang, and Shih-Hao Hung. 2020. PerfNetRT: Platform-Aware Performance Modeling for Optimized Deep Neural Networks. In 2020 International Computer Symposium (ICS). IEEE, 153\u2013158."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299107"},{"key":"e_1_3_2_1_42_1","volume-title":"The gem5 simulator: Version 20.0+. arXiv preprint arXiv:2007.03152","author":"Lowe-Power Jason","year":"2020","unstructured":"Jason Lowe-Power , Abdul\u00a0Mutaal Ahmad , Ayaz Akram , Mohammad Alian , Rico Amslinger , Matteo Andreozzi , Adri\u00e0 Armejach , Nils Asmussen , Brad Beckmann , Srikant Bharadwaj , 2020. The gem5 simulator: Version 20.0+. arXiv preprint arXiv:2007.03152 ( 2020 ). Jason Lowe-Power, Abdul\u00a0Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adri\u00e0 Armejach, Nils Asmussen, Brad Beckmann, Srikant Bharadwaj, 2020. The gem5 simulator: Version 20.0+. arXiv preprint arXiv:2007.03152 (2020)."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1874254"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2018.8573521"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3469029"},{"key":"e_1_3_2_1_46_1","volume-title":"https:\/\/developer.nvidia.com\/cuda-toolkit [Accessed","author":"NVIDIA.","year":"2022","unstructured":"NVIDIA. 2022. CUDA. https:\/\/developer.nvidia.com\/cuda-toolkit [Accessed October 6, 2022 ]. NVIDIA. 2022. CUDA. https:\/\/developer.nvidia.com\/cuda-toolkit [Accessed October 6, 2022]."},{"key":"e_1_3_2_1_47_1","volume-title":"cuFFT. https:\/\/developer.nvidia.com\/cufft [Accessed","author":"NVIDIA.","year":"2022","unstructured":"NVIDIA. 2022. cuFFT. https:\/\/developer.nvidia.com\/cufft [Accessed October 6, 2022 ]. NVIDIA. 2022. cuFFT. https:\/\/developer.nvidia.com\/cufft [Accessed October 6, 2022]."},{"key":"e_1_3_2_1_48_1","volume-title":"Matrix Multiplication Background. https:\/\/docs.nvidia.com\/deeplearning\/performance\/dl-performance-matrix-multiplication\/index.html [Accessed","author":"NVIDIA.","year":"2022","unstructured":"NVIDIA. 2022. Matrix Multiplication Background. https:\/\/docs.nvidia.com\/deeplearning\/performance\/dl-performance-matrix-multiplication\/index.html [Accessed October 6, 2022 ]. NVIDIA. 2022. Matrix Multiplication Background. https:\/\/docs.nvidia.com\/deeplearning\/performance\/dl-performance-matrix-multiplication\/index.html [Accessed October 6, 2022]."},{"key":"e_1_3_2_1_49_1","volume-title":"OpenAI and Microsoft Extend Partnership. https:\/\/openai.com\/blog\/openai-and-microsoft-extend-partnership. Accessed on","author":"AI.","year":"2023","unstructured":"Open AI. 2022. OpenAI and Microsoft Extend Partnership. https:\/\/openai.com\/blog\/openai-and-microsoft-extend-partnership. Accessed on : April 28, 2023 . OpenAI. 2022. OpenAI and Microsoft Extend Partnership. https:\/\/openai.com\/blog\/openai-and-microsoft-extend-partnership. Accessed on: April 28, 2023."},{"key":"e_1_3_2_1_51_1","volume-title":"An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458","author":"O\u2019Shea Keiron","year":"2015","unstructured":"Keiron O\u2019Shea and Ryan Nash . 2015. An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458 ( 2015 ). Keiron O\u2019Shea and Ryan Nash. 2015. An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458 (2015)."},{"key":"e_1_3_2_1_52_1","volume-title":"fairseq: A fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038","author":"Ott Myle","year":"2019","unstructured":"Myle Ott , Sergey Edunov , Alexei Baevski , Angela Fan , Sam Gross , Nathan Ng , David Grangier , and Michael Auli . 2019. fairseq: A fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038 ( 2019 ). Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038 (2019)."},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2019.00042"},{"key":"e_1_3_2_1_54_1","volume-title":"Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , 2019 . Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019). Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_3_2_1_55_1","volume-title":"Paleo: A performance model for deep neural networks.","author":"Qi Hang","year":"2016","unstructured":"Hang Qi , Evan\u00a0 R Sparks , and Ameet Talwalkar . 2016 . Paleo: A performance model for deep neural networks. (2016). Hang Qi, Evan\u00a0R Sparks, and Ameet Talwalkar. 2016. Paleo: A performance model for deep neural networks. (2016)."},{"key":"e_1_3_2_1_56_1","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans Ilya Sutskever 2018. Improving language understanding by generative pre-training. (2018).  Alec Radford Karthik Narasimhan Tim Salimans Ilya Sutskever 2018. Improving language understanding by generative pre-training. (2018)."},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS48437.2020.00018"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS48437.2020.00016"},{"key":"e_1_3_2_1_60_1","volume-title":"Scale-sim: Systolic cnn accelerator simulator. arXiv preprint arXiv:1811.02883","author":"Samajdar Ananda","year":"2018","unstructured":"Ananda Samajdar , Yuhao Zhu , Paul Whatmough , Matthew Mattina , and Tushar Krishna . 2018 . Scale-sim: Systolic cnn accelerator simulator. arXiv preprint arXiv:1811.02883 (2018). Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. Scale-sim: Systolic cnn accelerator simulator. arXiv preprint arXiv:1811.02883 (2018)."},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3381831"},{"key":"e_1_3_2_1_63_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3322230"},{"key":"e_1_3_2_1_65_1","volume-title":"Evaluating Performance Tradeoffs on the Radeon Open Compute Platform. In IEEE International Symposium on Performance Analysis of Systems and Software.","author":"Sun Yifan","year":"2018","unstructured":"Yifan Sun , Saoni Mukherjee , Trinayan Baruah , Shi Dong , Julian Gutierrez , Prannory Mohan , and David Kaeli . 2018 . Evaluating Performance Tradeoffs on the Radeon Open Compute Platform. In IEEE International Symposium on Performance Analysis of Systems and Software. Yifan Sun, Saoni Mukherjee, Trinayan Baruah, Shi Dong, Julian Gutierrez, Prannory Mohan, and David Kaeli. 2018. Evaluating Performance Tradeoffs on the Radeon Open Compute Platform. In IEEE International Symposium on Performance Analysis of Systems and Software."},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2017.2761740"},{"key":"e_1_3_2_1_67_1","volume-title":"Serving DNN models with multi-instance gpus: A case of the reconfigurable machine scheduling problem. arXiv preprint arXiv:2109.11067","author":"Tan Cheng","year":"2021","unstructured":"Cheng Tan , Zhichao Li , Jian Zhang , Yu Cao , Sikai Qi , Zherui Liu , Yibo Zhu , and Chuanxiong Guo . 2021. Serving DNN models with multi-instance gpus: A case of the reconfigurable machine scheduling problem. arXiv preprint arXiv:2109.11067 ( 2021 ). Cheng Tan, Zhichao Li, Jian Zhang, Yu Cao, Sikai Qi, Zherui Liu, Yibo Zhu, and Chuanxiong Guo. 2021. Serving DNN models with multi-instance gpus: A case of the reconfigurable machine scheduling problem. arXiv preprint arXiv:2109.11067 (2021)."},{"key":"e_1_3_2_1_68_1","volume-title":"http:\/\/www.khronos.org\/opencl\/ [Accessed","author":"The Khronos OpenCL Working Group","year":"2022","unstructured":"The Khronos OpenCL Working Group . 2022. Open CL. http:\/\/www.khronos.org\/opencl\/ [Accessed October 6, 2022 ]. The Khronos OpenCL Working Group. 2022. OpenCL. http:\/\/www.khronos.org\/opencl\/ [Accessed October 6, 2022]."},{"key":"e_1_3_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00077"},{"key":"e_1_3_2_1_70_1","volume-title":"Huggingface\u2019s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771","author":"Wolf Thomas","year":"2019","unstructured":"Thomas Wolf , Lysandre Debut , Victor Sanh , Julien Chaumond , Clement Delangue , Anthony Moi , Pierric Cistac , Tim Rault , R\u00e9mi Louf , Morgan Funtowicz , 2019. Huggingface\u2019s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 ( 2019 ). Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R\u00e9mi Louf, Morgan Funtowicz, 2019. Huggingface\u2019s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)."},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_3_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/3424669"},{"key":"e_1_3_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.1611.05431"},{"key":"e_1_3_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.5555\/3485849.3485855"},{"key":"e_1_3_2_1_75_1","volume-title":"Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training. arXiv preprint arXiv:2208.06102","author":"You Jie","year":"2022","unstructured":"Jie You , Jae-Won Chung , and Mosharaf Chowdhury . 2022 . Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training. arXiv preprint arXiv:2208.06102 (2022). Jie You, Jae-Won Chung, and Mosharaf Chowdhury. 2022. Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training. arXiv preprint arXiv:2208.06102 (2022)."},{"key":"e_1_3_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458864.3467882"},{"key":"e_1_3_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00093"},{"key":"e_1_3_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507708"},{"key":"e_1_3_2_1_79_1","volume-title":"THOP: PyTorch-OpCounter. THOP: PyTorch-OpCounter","author":"Zhu Ligeng","year":"2022","unstructured":"Ligeng Zhu . 2022 . THOP: PyTorch-OpCounter. THOP: PyTorch-OpCounter Ligeng Zhu. 2022. THOP: PyTorch-OpCounter. THOP: PyTorch-OpCounter"}],"event":{"name":"MICRO '23: 56th Annual IEEE\/ACM International Symposium on Microarchitecture","location":"Toronto ON Canada","acronym":"MICRO '23","sponsor":["SIGMICRO ACM Special Interest Group on Microarchitectural Research and Processing"]},"container-title":["56th Annual IEEE\/ACM International Symposium on Microarchitecture"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3613424.3614277","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3613424.3614277","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:36:29Z","timestamp":1750178189000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3613424.3614277"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,28]]},"references-count":77,"alternative-id":["10.1145\/3613424.3614277","10.1145\/3613424"],"URL":"https:\/\/doi.org\/10.1145\/3613424.3614277","relation":{},"subject":[],"published":{"date-parts":[[2023,10,28]]},"assertion":[{"value":"2023-12-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}