{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,23]],"date-time":"2026-03-23T23:09:49Z","timestamp":1774307389777,"version":"3.50.1"},"reference-count":75,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2024,11,20]],"date-time":"2024-11-20T00:00:00Z","timestamp":1732060800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2024,12,31]]},"abstract":"<jats:p>High-performance, multi-core processors are the key to accelerating workloads in several application domains. To continue to scale performance at the limit of Moore\u2019s Law and Dennard scaling, software and hardware designers have turned to dynamic solutions that adapt to the needs of applications in a transparent, automatic way. For example, modern hardware improves its performance and power efficiency by changing the hardware configuration, like the frequency and voltage of cores, according to a number of parameters, such as the technology used or the workload running at the time. With this level of dynamism, it is essential to simulate next-generation multi-core processors in a way that can both respond to system changes and accurately determine system performance metrics. Currently, no sampled simulation platform can achieve these goals of dynamic, fast, and accurate simulation of multi-threaded workloads.<\/jats:p>\n          <jats:p>In this work, we propose a solution that allows for fast, accurate simulation in the presence of both hardware and software dynamism. To accomplish this goal, we present Pac-Sim, a novel sampled simulation methodology for fast, accurate sampled simulation that requires no upfront analysis of the workload. With our proposed methodology, it is now possible to simulate long-running dynamically scheduled multi-threaded programs with significant simulation speedups, even in the presence of dynamic hardware events. We evaluate Pac-Sim using the SPEC CPU2017, NPB, and PARSEC multi-threaded benchmarks with both static and dynamic thread scheduling. The experimental results show that Pac-Sim achieves a very low sampling error of 1.63% and 3.81% on average for statically and dynamically scheduled benchmarks, respectively. Pac-Sim also demonstrates significant simulation speedups as high as 523.5\u00d7 (210.3\u00d7 on average) for the training input set of SPEC CPU2017 running eight threads.<\/jats:p>","DOI":"10.1145\/3680548","type":"journal-article","created":{"date-parts":[[2024,8,27]],"date-time":"2024-08-27T11:13:30Z","timestamp":1724757210000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Pac-Sim: Simulation of Multi-threaded Workloads using Intelligent, Live Sampling"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9240-5926","authenticated-orcid":false,"given":"Changxi","family":"Liu","sequence":"first","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9736-3822","authenticated-orcid":false,"given":"Alen","family":"Sabu","sequence":"additional","affiliation":[{"name":"School of Computing, National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-9296-3042","authenticated-orcid":false,"given":"Akanksha","family":"Chaudhari","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison, Madison, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-5272-0231","authenticated-orcid":false,"given":"Qingxuan","family":"Kang","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8742-134X","authenticated-orcid":false,"given":"Trevor E.","family":"Carlson","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,11,20]]},"reference":[{"key":"e_1_3_2_2_2","volume-title":"International Symposium on High-Performance Computer Architecture (HPCA\u201903)","author":"Alameldeen A. R.","year":"2003","unstructured":"A. R. Alameldeen and D. A. Wood. 2003. Variability in architectural simulations of multi-threaded workloads. In International Symposium on High-Performance Computer Architecture (HPCA\u201903)."},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2006.73"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.1999.809463"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2013.6522340"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/1496909.1496921"},{"key":"e_1_3_2_7_2","volume-title":"The NAS Parallel Benchmarks 2.0","author":"Bailey David","year":"1995","unstructured":"David Bailey, Tim Harris, William Saphir, Rob Van Der Wijngaart, Alex Woo, and Maurice Yarrow. 1995. The NAS Parallel Benchmarks 2.0. Technical Report."},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2005.1430560"},{"key":"e_1_3_2_9_2","volume-title":"Benchmarking Modern Multiprocessors","author":"Bienia Christian","year":"2011","unstructured":"Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph. D. Dissertation."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_3_2_11_2","volume-title":"International Symposium on Computer Architecture (ISCA\u201910)","author":"Bircher William Lloyd","year":"2010","unstructured":"William Lloyd Bircher and Lizy John. 2010. Predictive power management for multi-core processors. In International Symposium on Computer Architecture (ISCA\u201910)."},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1016\/0895-7177(03)90058-6"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3185768.3185771"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASPDAC.2015.7059093"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063454"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2013.6557141"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/2629677"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2014.6844456"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1819"},{"key":"e_1_3_2_20_2","volume-title":"International Symposium on Workload Characterization (IISWC\u201909)","author":"Charles James","year":"2009","unstructured":"James Charles, Preet Jassi, Narayan S Ananth, Abbas Sadat, and Alexandra Fedorova. 2009. Evaluation of the Intel\u00ae core\u2122 i7 turbo boost feature. In International Symposium on Workload Characterization (IISWC\u201909)."},{"key":"e_1_3_2_21_2","article-title":"PARSECSs: Evaluating the impact of task parallelism in the PARSEC benchmark suite","author":"Chasapis Dimitrios","year":"2015","unstructured":"Dimitrios Chasapis, Marc Casas, Miquel Moret\u00f3, Raul Vidal, Eduard Ayguad\u00e9, Jes\u00fas Labarta, and Mateo Valero. 2015. PARSECSs: Evaluating the impact of task parallelism in the PARSEC benchmark suite. Transactions on Architecture and Code Optimization (TACO) (2015).","journal-title":"Transactions on Architecture and Code Optimization (TACO)"},{"key":"e_1_3_2_22_2","volume-title":"Workshop on Feedback-Directed and Dynamic Optimization","author":"Chen Wen-Ke","year":"2000","unstructured":"Wen-Ke Chen, Sorin Lerner, Ronnie Chaiken, and David M. Gillies. 2000. Mojo: A dynamic optimization system. In Workshop on Feedback-Directed and Dynamic Optimization."},{"key":"e_1_3_2_23_2","volume-title":"Conference on Uncertainty in Artificial Intelligence (UAI\u201900)","author":"Dasgupta Sanjoy","year":"2000","unstructured":"Sanjoy Dasgupta. 2000. Experiments with random projection. In Conference on Uncertainty in Artificial Intelligence (UAI\u201900)."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2015.7095792"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2019.00038"},{"key":"e_1_3_2_26_2","article-title":"A guide to vectorization with Intel C++ compilers","author":"Deilmann M.","year":"2012","unstructured":"M. Deilmann. 2012. A guide to vectorization with Intel C++ compilers. Intel Corporation (2012).","journal-title":"Intel Corporation"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3127068"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2017.38"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626411000151"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/bxh103"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2010.5452069"},{"key":"e_1_3_2_32_2","volume-title":"International Conference on Machine Learning (ICML\u201903)","author":"Elkan Charles","year":"2003","unstructured":"Charles Elkan. 2003. Using the triangle inequality to accelerate k-means. In International Conference on Machine Learning (ICML\u201903)."},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/1952998.1952999"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/1534909.1534910"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480045"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2007.363738"},{"key":"e_1_3_2_37_2","article-title":"Cluster analysis of multivariate data: Efficiency versus interpretability of classifications","author":"Forgy Edward W.","year":"1965","unstructured":"Edward W. Forgy. 1965. Cluster analysis of multivariate data: Efficiency versus interpretability of classifications. Biometrics (1965).","journal-title":"Biometrics"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2016.7482104"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446726"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2003.1190246"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446098"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.8"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/2541228.2555305"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2007.363739"},{"key":"e_1_3_2_45_2","volume-title":"International Symposium on High Performance Computer Architecture (ISCA\u201908)","author":"Kim Wonyoung","year":"2008","unstructured":"Wonyoung Kim, Meeta S. Gupta, Gu-Yeon Wei, and David Brooks. 2008. System level analysis of fast, per-core DVFS using on-chip switching regulators. In International Symposium on High Performance Computer Architecture (ISCA\u201908)."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00035"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2006.32"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2005.1430578"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1147\/sj.92.0078"},{"key":"e_1_3_2_50_2","article-title":"Master: A multicore cache energy-saving technique using dynamic cache reconfiguration","author":"Mittal Sparsh","year":"2013","unstructured":"Sparsh Mittal, Yanan Cao, and Zhao Zhang. 2013. Master: A multicore cache energy-saving technique using dynamic cache reconfiguration. Transactions on Very Large Scale Integration Systems (VLSI) (2013).","journal-title":"Transactions on Very Large Scale Integration Systems (VLSI)"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2013.6657031"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358264"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/SAMOS.2016.7818337"},{"key":"e_1_3_2_54_2","unstructured":"OpenMP [n.d.]. OpenMP 3.1 API C\/C++ Syntax Quick Reference Card. https:\/\/www.openmp.org\/wp-content\/uploads\/OpenMP3.1-CCard.pdf"},{"key":"e_1_3_2_55_2","volume-title":"Hot Chips Symposium (HCS\u201920)","author":"Papazian Irma Esmer","year":"2020","unstructured":"Irma Esmer Papazian. 2020. New 3rd Gen Intel\u00ae Xeon\u00ae scalable processor (codename: Ice Lake-SP). In Hot Chips Symposium (HCS\u201920)."},{"key":"e_1_3_2_56_2","volume-title":"Workshop on Reproducible Research Methodologies (REPRODUCE\u201914)","author":"Patil Harish","year":"2014","unstructured":"Harish Patil and Trevor E. Carlson. 2014. Pinballs: Portable and shareable user-level checkpoints for reproducible analysis and simulation. In Workshop on Reproducible Research Methodologies (REPRODUCE\u201914)."},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO51591.2021.9370340"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/1772954.1772958"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2006.1639325"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2024.3354069"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA53966.2022.00051"},{"key":"e_1_3_2_62_2","unstructured":"Alen Sabu Harish Patil Wim Heirman and Trevor E. Carlson. 2022. LoopPoint Tools. https:\/\/github.com\/nus-comparch\/looppoint"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485963"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2015.29"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1145\/1024393.1024414"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605403"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/871656.859657"},{"key":"e_1_3_2_68_2","article-title":"Efficient sampling startup for SimPoint","author":"Biesbrouck Michael Van","year":"2006","unstructured":"Michael Van Biesbrouck, Brad Calder, and Lieven Eeckhout. 2006. Efficient sampling startup for SimPoint. IEEE Micro (2006).","journal-title":"IEEE Micro"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2004.1291355"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/379539.379583"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2006.79"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1145\/859618.859629"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1145\/3545008.3545042"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2015.7095784"},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480042"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2005.7"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3680548","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3680548","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:11Z","timestamp":1750295891000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3680548"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,20]]},"references-count":75,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,12,31]]}},"alternative-id":["10.1145\/3680548"],"URL":"https:\/\/doi.org\/10.1145\/3680548","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,20]]},"assertion":[{"value":"2023-11-23","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-26","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}