{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T17:32:46Z","timestamp":1778693566559,"version":"3.51.4"},"reference-count":59,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2022,5,26]],"date-time":"2022-05-26T00:00:00Z","timestamp":1653523200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000015","name":"U.S. Department of Energy","doi-asserted-by":"publisher","award":["DE-SC0012704"],"award-info":[{"award-number":["DE-SC0012704"]}],"id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Meas. Anal. Comput. Syst."],"published-print":{"date-parts":[[2022,5,26]]},"abstract":"<jats:p>While cycle-accurate simulators are essential tools for architecture research, design, and development, their practicality is limited by an extremely long time-to-solution for realistic applications under investigation. This work describes a concerted effort, where machine learning (ML) is used to accelerate microarchitecture simulation. First, an ML-based instruction latency prediction framework that accounts for both static instruction properties and dynamic processor states is constructed. Then, a GPU-accelerated parallel simulator is implemented based on the proposed instruction latency predictor, and its simulation accuracy and throughput are validated and evaluated against a state-of-the-art simulator. Leveraging modern GPUs, the ML-based simulator outperforms traditional CPU-based simulators significantly.<\/jats:p>","DOI":"10.1145\/3530891","type":"journal-article","created":{"date-parts":[[2022,6,6]],"date-time":"2022-06-06T17:16:18Z","timestamp":1654535778000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["SimNet"],"prefix":"10.1145","volume":"6","author":[{"given":"Lingda","family":"Li","sequence":"first","affiliation":[{"name":"Brookhaven National Laboratory, Upton, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Santosh","family":"Pandey","sequence":"additional","affiliation":[{"name":"Stevens Institute of Technology, Hoboken, NJ, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thomas","family":"Flynn","sequence":"additional","affiliation":[{"name":"Brookhaven National Laboratory, Upton, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hang","family":"Liu","sequence":"additional","affiliation":[{"name":"Stevens Institute of Technology, Hoboken, NJ, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Noel","family":"Wheeler","sequence":"additional","affiliation":[{"name":"Laboratory for Physical Sciences, College Park, MD, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Adolfy","family":"Hoisie","sequence":"additional","affiliation":[{"name":"Brookhaven National Laboratory, Upton, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,6,6]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2020. DGX A100: Universal System for AI Infrastructure. https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-a100\/  2020. DGX A100: Universal System for AI Infrastructure. https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-a100\/"},{"key":"e_1_2_1_2_1","volume-title":"Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16). 265--283.","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , 2016 . Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16). 265--283. Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16). 265--283."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00046"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830780"},{"key":"e_1_2_1_5_1","volume-title":"2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing. 254--261. https:\/\/doi.org\/10.1109\/SBAC-PAD.2014.30 Proc. ACM Meas. Anal. Comput. Syst.","volume":"6","author":"Baldini I.","year":"2022","unstructured":"I. Baldini , S. J. Fink , and E. Altman . 2014. Predicting GPU Performance from CPU Runs Using Machine Learning . In 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing. 254--261. https:\/\/doi.org\/10.1109\/SBAC-PAD.2014.30 Proc. ACM Meas. Anal. Comput. Syst. , Vol. 6 , No. 2, Article 25. Publication date : June 2022 . 25:22 Lingda Li, Santosh Pandey, Thomas Flynn, Hang Liu, Noel Wheeler, and Adolfy Hoisie 10.1109\/SBAC-PAD.2014.30 I. Baldini, S. J. Fink, and E. Altman. 2014. Predicting GPU Performance from CPU Runs Using Machine Learning. In 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing. 254--261. https:\/\/doi.org\/10.1109\/SBAC-PAD.2014.30 Proc. ACM Meas. Anal. Comput. Syst., Vol. 6, No. 2, Article 25. Publication date: June 2022. 25:22 Lingda Li, Santosh Pandey, Thomas Flynn, Hang Liu, Noel Wheeler, and Adolfy Hoisie"},{"key":"e_1_2_1_6_1","volume-title":"FREENIX Track","volume":"41","author":"Bellard Fabrice","year":"2005","unstructured":"Fabrice Bellard . 2005 . QEMU, a fast and portable dynamic translator.. In USENIX annual technical conference , FREENIX Track , Vol. 41 . Califor-nia, USA, 46. Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator.. In USENIX annual technical conference, FREENIX Track, Vol. 41. Califor-nia, USA, 46."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3185768.3185771"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2014.6844456"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.1996.563595"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/3322706.3361996"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/2015039.2015543"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2014.6844457"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_15_1","volume-title":"Long short-term memory. Neural computation 9, 8","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural computation 9, 8 ( 1997 ), 1735--1780. Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1168857.1168882"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MoBS), co-located with ISCA. 28--36","author":"Jaleel Aamer","year":"2008","unstructured":"Aamer Jaleel , Robert S Cohn , Chi-Keung Luk , and Bruce Jacob . 2008 . CMPim: A Pin-based on-the-fly multi-core cache simulator . In Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MoBS), co-located with ISCA. 28--36 . Aamer Jaleel, Robert S Cohn, Chi-Keung Luk, and Bruce Jacob. 2008. CMPim: A Pin-based on-the-fly multi-core cache simulator. In Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MoBS), co-located with ISCA. 28--36."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.4018\/jdst.2010040104"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/3433701.3433707"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"#cr-split#-e_1_2_1_21_1.1","doi-asserted-by":"crossref","unstructured":"Sagar Karandikar Howard Mao Donggyu Kim David Biancolin Alon Amid Dayeol Lee Nathan Pemberton Emmanuel Amaro Colin Schmidt Aditya Chopra Qijing Huang Kyle Kovacs Borivoje Nikolic Randy Katz Jonathan Bachrach and Krste Asanovi?. 2018. Firesim: FPGA-Accelerated Cycle-Exact Scale-out System Simulation in the Public Cloud. In Proceedings of the 45th Annual International Symposium on Computer Architecture (Los Angeles California) (ISCA '18). IEEE Press 29--42. https:\/\/doi.org\/10.1109\/ISCA.2018.00014 Proc. ACM Meas. Anal. Comput. Syst. Vol. 6 No. 2 Article 25. Publication date: June 2022. SimNet: Accurate and High-Performance Computer Architecture Simulation using Deep Learning 25:23 10.1109\/ISCA.2018.00014","DOI":"10.1109\/ISCA.2018.00014"},{"key":"#cr-split#-e_1_2_1_21_1.2","doi-asserted-by":"crossref","unstructured":"Sagar Karandikar Howard Mao Donggyu Kim David Biancolin Alon Amid Dayeol Lee Nathan Pemberton Emmanuel Amaro Colin Schmidt Aditya Chopra Qijing Huang Kyle Kovacs Borivoje Nikolic Randy Katz Jonathan Bachrach and Krste Asanovi?. 2018. Firesim: FPGA-Accelerated Cycle-Exact Scale-out System Simulation in the Public Cloud. In Proceedings of the 45th Annual International Symposium on Computer Architecture (Los Angeles California) (ISCA '18). IEEE Press 29--42. https:\/\/doi.org\/10.1109\/ISCA.2018.00014 Proc. ACM Meas. Anal. Comput. Syst. Vol. 6 No. 2 Article 25. Publication date: June 2022. SimNet: Accurate and High-Performance Computer Architecture Simulation using Deep Learning 25:23","DOI":"10.1109\/ISCA.2018.00014"},{"key":"e_1_2_1_22_1","volume-title":"Adam: A Method for Stochastic Optimization. international conference on learning representations","author":"Diederik Kingma P.","year":"2015","unstructured":"P. Diederik Kingma and Lei Jimmy Ba . 2015 . Adam: A Method for Stochastic Optimization. international conference on learning representations (2015). P. Diederik Kingma and Lei Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. international conference on learning representations (2015)."},{"key":"e_1_2_1_23_1","volume-title":"Evaluation of the riken post-k processor simulator. arXiv preprint arXiv:1904.06451","author":"Kodama Yuetsu","year":"2019","unstructured":"Yuetsu Kodama , Tetsuya Odajima , Akira Asato , and Mitsuhisa Sato . 2019. Evaluation of the riken post-k processor simulator. arXiv preprint arXiv:1904.06451 ( 2019 ). Yuetsu Kodama, Tetsuya Odajima, Akira Asato, and Mitsuhisa Sato. 2019. Evaluation of the riken post-k processor simulator. arXiv preprint arXiv:1904.06451 (2019)."},{"key":"e_1_2_1_24_1","volume-title":"Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E Hinton . 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 ( 2012 ), 1097--1105. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012), 1097--1105."},{"key":"e_1_2_1_25_1","volume-title":"Illustrative Design Space Studies with Microarchitectural Regression Models. In 2007 IEEE 13th International Symposium on High Performance Computer Architecture. 340--351","author":"Lee B. C.","year":"2007","unstructured":"B. C. Lee and D. M. Brooks . 2007 . Illustrative Design Space Studies with Microarchitectural Regression Models. In 2007 IEEE 13th International Symposium on High Performance Computer Architecture. 340--351 . https:\/\/doi.org\/10.1109\/ HPCA. 2007 .346211 B. C. Lee and D. M. Brooks. 2007. Illustrative Design Space Studies with Microarchitectural Regression Models. In 2007 IEEE 13th International Symposium on High Performance Computer Architecture. 340--351. https:\/\/doi.org\/10.1109\/ HPCA.2007.346211"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","author":"Lee Benjamin C.","unstructured":"Benjamin C. Lee , David M. Brooks , Bronis R. de Supinski , Martin Schulz , Karan Singh , and Sally A . McKee. 2007. Methods of Inference and Learning for Performance Modeling of Parallel Applications . In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming ( San Jose, California, USA) (PPoPP '07). Association for Computing Machinery, New York, NY, USA, 249--258. https:\/\/doi.org\/10.1145\/1229428.1229479 10.1145\/1229428.1229479 Benjamin C. Lee, David M. Brooks, Bronis R. de Supinski, Martin Schulz, Karan Singh, and Sally A. McKee. 2007. Methods of Inference and Learning for Performance Modeling of Parallel Applications. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (San Jose, California, USA) (PPoPP '07). Association for Computing Machinery, New York, NY, USA, 249--258. https:\/\/doi.org\/10.1145\/1229428.1229479"},{"key":"e_1_2_1_27_1","first-page":"I","article-title":"A Unified Approach to Interpreting Model Predictions","volume":"30","author":"Lundberg Scott M","year":"2017","unstructured":"Scott M Lundberg and Su-In Lee . 2017 . A Unified Approach to Interpreting Model Predictions . In Advances in Neural Information Processing Systems 30 , I . Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765--4774. Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765--4774.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_28_1","volume-title":"International Conference on Machine Learning. PMLR, 4505--4515","author":"Mendis Charith","year":"2019","unstructured":"Charith Mendis , Alex Renda , Saman Amarasinghe , and Michael Carbin . 2019 . Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks . In International Conference on Machine Learning. PMLR, 4505--4515 . Charith Mendis, Alex Renda, Saman Amarasinghe, and Michael Carbin. 2019. Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks. In International Conference on Machine Learning. PMLR, 4505--4515."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/SBAC-PAD.2017.23"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture","author":"Nikoleris Nikos","unstructured":"Nikos Nikoleris , Lieven Eeckhout , Erik Hagersten , and Trevor E. Carlson . 2019. Directed Statistical Warming through Time Traveling . In Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture ( Columbus, OH, USA) (MICRO '52). Association for Computing Machinery, New York, NY, USA, 1037--1049. https:\/\/doi.org\/10. 1145\/3352460.3358264 Nikos Nikoleris, Lieven Eeckhout, Erik Hagersten, and Trevor E. Carlson. 2019. Directed Statistical Warming through Time Traveling. In Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (Columbus, OH, USA) (MICRO '52). Association for Computing Machinery, New York, NY, USA, 1037--1049. https:\/\/doi.org\/10. 1145\/3352460.3358264"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126557"},{"key":"e_1_2_1_32_1","volume-title":"Garnett (Eds.)","volume":"32","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , Alban Desmaison , Andreas Kopf , Edward Yang , Zachary DeVito , Martin Raison , Alykhan Tejani , Sasank Chilamkurthy , Benoit Steiner , Lu Fang , Junjie Bai , and Soumith Chintala . 2019 . PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00e9-Buc, E. Fox, and R . Garnett (Eds.) , Vol. 32 . Curran Associates, Inc., 8026--8037. https:\/\/proceedings.neurips.cc\/paper\/ 2019\/file\/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc., 8026--8037. https:\/\/proceedings.neurips.cc\/paper\/2019\/file\/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf"},{"key":"e_1_2_1_33_1","volume-title":"2011 48th ACM\/EDAC\/IEEE Design Automation Conference (DAC). 1050--1055","author":"Patel A.","unstructured":"A. Patel , F. Afram , S. Chen , and K. Ghose . 2011. MARSS: A full system simulator for multicore x86 CPUs . In 2011 48th ACM\/EDAC\/IEEE Design Automation Conference (DAC). 1050--1055 . A. Patel, F. Afram, S. Chen, and K. Ghose. 2011. MARSS: A full system simulator for multicore x86 CPUs. In 2011 48th ACM\/EDAC\/IEEE Design Automation Conference (DAC). 1050--1055."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2004.28"},{"key":"e_1_2_1_35_1","volume-title":"A survey of machine learning applied to computer architecture design. arXiv preprint arXiv:1909.12373","author":"Penney Drew D","year":"2019","unstructured":"Drew D Penney and Lizhong Chen . 2019. A survey of machine learning applied to computer architecture design. arXiv preprint arXiv:1909.12373 ( 2019 ). Drew D Penney and Lizhong Chen. 2019. A survey of machine learning applied to computer architecture design. arXiv preprint arXiv:1909.12373 (2019)."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/885651.781076"},{"key":"e_1_2_1_37_1","volume-title":"DiffTune: Optimizing CPU Simulator Parameters with Learned Differentiable Surrogates. In IEEE\/ACM International Symposium on Microarchitecture.","author":"Renda Alex","year":"2020","unstructured":"Alex Renda , Yishen Chen , Charith Mendis , and Michael Carbin . 2020 . DiffTune: Optimizing CPU Simulator Parameters with Learned Differentiable Surrogates. In IEEE\/ACM International Symposium on Microarchitecture. Alex Renda, Yishen Chen, Charith Mendis, and Michael Carbin. 2020. DiffTune: Optimizing CPU Simulator Parameters with Learned Differentiable Surrogates. In IEEE\/ACM International Symposium on Microarchitecture."},{"key":"e_1_2_1_38_1","volume-title":"Learning representations by back-propagating errors. nature 323, 6088","author":"Rumelhart David E","year":"1986","unstructured":"David E Rumelhart , Geoffrey E Hinton , and Ronald J Williams . 1986. Learning representations by back-propagating errors. nature 323, 6088 ( 1986 ), 533--536. David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986. Learning representations by back-propagating errors. nature 323, 6088 (1986), 533--536."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2508148.2485963"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2015.29"},{"key":"e_1_2_1_41_1","volume-title":"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. Proc. ACM Meas. Anal. Comput. Syst.","volume":"6","author":"Sato M.","year":"2022","unstructured":"M. Sato , Y. Ishikawa , H. Tomita , Y. Kodama , T. Odajima , M. Tsuji , H. Yashiro , M. Aoki , N. Shida , I. Miyoshi , K. Hirai , A. Furuya , A. Asato , K. Morita , and T. Shimizu . 2020. Co-Design for A64FX Manycore Processor and \"Fugaku \". In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. Proc. ACM Meas. Anal. Comput. Syst. , Vol. 6 , No. 2, Article 25. Publication date : June 2022 . 25:24 Lingda Li, Santosh Pandey, Thomas Flynn, Hang Liu, Noel Wheeler, and Adolfy Hoisie M. Sato, Y. Ishikawa, H. Tomita, Y. Kodama, T. Odajima, M. Tsuji, H. Yashiro, M. Aoki, N. Shida, I. Miyoshi, K. Hirai, A. Furuya, A. Asato, K. Morita, and T. Shimizu. 2020. Co-Design for A64FX Manycore Processor and \"Fugaku\". In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. Proc. ACM Meas. Anal. Comput. Syst., Vol. 6, No. 2, Article 25. Publication date: June 2022. 25:24 Lingda Li, Santosh Pandey, Thomas Flynn, Hang Liu, Noel Wheeler, and Adolfy Hoisie"},{"key":"e_1_2_1_42_1","volume-title":"Alex Bridgland, et al.","author":"Senior Andrew W","year":"2020","unstructured":"Andrew W Senior , Richard Evans , John Jumper , James Kirkpatrick , Laurent Sifre , Tim Green , Chongli Qin , Augustin \u00eddek , Alexander WR Nelson , Alex Bridgland, et al. 2020 . Improved protein structure prediction using potentials from deep learning. Nature 577, 7792 (2020), 706--710. Andrew W Senior, Richard Evans, John Jumper, James Kirkpatrick, Laurent Sifre, Tim Green, Chongli Qin, Augustin \u00eddek, Alexander WR Nelson, Alex Bridgland, et al. 2020. Improved protein structure prediction using potentials from deep learning. Nature 577, 7792 (2020), 706--710."},{"key":"e_1_2_1_43_1","volume-title":"TAGE-SC-L Branch Predictors Again. In 5th JILP Workshop on Computer Architecture Competitions (JWAC-5): Championship Branch Prediction (CBP-5)","author":"Seznec Andr\u00e9","year":"2016","unstructured":"Andr\u00e9 Seznec . 2016 . TAGE-SC-L Branch Predictors Again. In 5th JILP Workshop on Computer Architecture Competitions (JWAC-5): Championship Branch Prediction (CBP-5) . Seoul, South Korea. https:\/\/hal.inria.fr\/hal-01354253 Andr\u00e9 Seznec. 2016. TAGE-SC-L Branch Predictors Again. In 5th JILP Workshop on Computer Architecture Competitions (JWAC-5): Championship Branch Prediction (CBP-5). Seoul, South Korea. https:\/\/hal.inria.fr\/hal-01354253"},{"key":"e_1_2_1_44_1","volume-title":"A value for n-person games. Contributions to the Theory of Games 2, 28","author":"Shapley Lloyd S","year":"1953","unstructured":"Lloyd S Shapley . 1953. A value for n-person games. Contributions to the Theory of Games 2, 28 ( 1953 ), 307--317. Lloyd S Shapley. 1953. A value for n-person games. Contributions to the Theory of Games 2, 28 (1953), 307--317."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605403"},{"key":"e_1_2_1_46_1","volume-title":"Spectral Audio Signal Processing. https:\/\/ccrma.stanford.edu\/~jos\/sasp online book","author":"Smith Julius O.","year":"2011","unstructured":"Julius O. Smith . 2011. Spectral Audio Signal Processing. https:\/\/ccrma.stanford.edu\/~jos\/sasp online book , 2011 edition. Julius O. Smith. 2011. Spectral Audio Signal Processing. https:\/\/ccrma.stanford.edu\/~jos\/sasp online book, 2011 edition."},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Tan Mingxing","unstructured":"Mingxing Tan , Bo Chen , Ruoming Pang , Vijay Vasudevan , Mark Sandler , Andrew Howard , and Quoc V. Le . 2019. MnasNet: Platform-Aware Neural Architecture Search for Mobile . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. MnasNet: Platform-Aware Neural Architecture Search for Mobile. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research","volume":"6114","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc Le . 2019 . EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks . In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research , Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 6105-- 6114 . http:\/\/proceedings.mlr.press\/v97\/tan19a.html Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 6105--6114. http:\/\/proceedings.mlr.press\/v97\/tan19a.html"},{"key":"e_1_2_1_49_1","volume-title":"Cache Simulation for Instruction Set Simulator QEMU. In 2014 IEEE 12th International Conference on Dependable, Autonomic and Secure Computing. 441--446","author":"Dung Tran Van","year":"2014","unstructured":"Tran Van Dung , Ittetsu Taniguchi , and Hiroyuki Tomiyama . 2014 . Cache Simulation for Instruction Set Simulator QEMU. In 2014 IEEE 12th International Conference on Dependable, Autonomic and Secure Computing. 441--446 . https: \/\/doi.org\/10.1109\/DASC.2014.85 10.1109\/DASC.2014.85 Tran Van Dung, Ittetsu Taniguchi, and Hiroyuki Tomiyama. 2014. Cache Simulation for Instruction Set Simulator QEMU. In 2014 IEEE 12th International Conference on Dependable, Autonomic and Secure Computing. 441--446. https: \/\/doi.org\/10.1109\/DASC.2014.85"},{"key":"e_1_2_1_50_1","unstructured":"Han Vanholder. 2016. Efficient inference with tensorrt.  Han Vanholder. 2016. Efficient inference with tensorrt."},{"key":"e_1_2_1_51_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2006.79"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2015.7056063"},{"key":"e_1_2_1_54_1","volume-title":"A Survey of Machine Learning for Computer Architecture and Systems. arXiv preprint arXiv:2102.07952","author":"Wu Nan","year":"2021","unstructured":"Nan Wu and Yuan Xie . 2021. A Survey of Machine Learning for Computer Architecture and Systems. arXiv preprint arXiv:2102.07952 ( 2021 ). Nan Wu and Yuan Xie. 2021. A Survey of Machine Learning for Computer Architecture and Systems. arXiv preprint arXiv:2102.07952 (2021)."},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of the 30th Annual International Symposium on Computer Architecture","author":"Wunderlich Roland E.","unstructured":"Roland E. Wunderlich , Thomas F. Wenisch, Babak Falsafi, and James C. Hoe. 2003. SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling . In Proceedings of the 30th Annual International Symposium on Computer Architecture ( San Diego, California) (ISCA '03). Association for Computing Machinery, New York, NY, USA, 84--97. https:\/\/doi.org\/10.1145\/859618.859629 10.1145\/859618.859629 Roland E. Wunderlich, Thomas F. Wenisch, Babak Falsafi, and James C. Hoe. 2003. SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling. In Proceedings of the 30th Annual International Symposium on Computer Architecture (San Diego, California) (ISCA '03). Association for Computing Machinery, New York, NY, USA, 84--97. https:\/\/doi.org\/10.1145\/859618.859629"},{"key":"e_1_2_1_56_1","volume-title":"Hot Chips","volume":"30","author":"Yoshida Toshio","year":"2018","unstructured":"Toshio Yoshida . 2018 . Fujitsu high performance CPU for the Post-K Computer . In Hot Chips , Vol. 30 . Toshio Yoshida. 2018. Fujitsu high performance CPU for the Post-K Computer. In Hot Chips, Vol. 30."},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897937.2897977"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/SAMOS.2015.7363659"}],"container-title":["Proceedings of the ACM on Measurement and Analysis of Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3530891","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3530891","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3530891","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:26Z","timestamp":1750183766000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3530891"}},"subtitle":["Accurate and High-Performance Computer Architecture Simulation using Deep Learning"],"short-title":[],"issued":{"date-parts":[[2022,5,26]]},"references-count":59,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,5,26]]}},"alternative-id":["10.1145\/3530891"],"URL":"https:\/\/doi.org\/10.1145\/3530891","relation":{},"ISSN":["2476-1249"],"issn-type":[{"value":"2476-1249","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,26]]},"assertion":[{"value":"2022-06-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}