{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:14:26Z","timestamp":1750306466232,"version":"3.41.0"},"reference-count":34,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2015,11,16]],"date-time":"2015-11-16T00:00:00Z","timestamp":1447632000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2016,1,7]]},"abstract":"<jats:p>\n            Sampled microarchitectural simulation of single-threaded applications is mature technology for over a decade now. Sampling multithreaded applications, on the other hand, is much more complicated. Not until very recently have researchers proposed solutions for sampled simulation of multithreaded applications.\n            <jats:italic>Time-Based Sampling (TBS)<\/jats:italic>\n            samples multithreaded application execution based on time\u2014not instructions as is typically done for single-threaded applications\u2014yielding estimates for a multithreaded application\u2019s execution time. In this article, we revisit and analyze previously proposed TBS approaches (periodic and cantor fractal based sampling), and we obtain a number of novel and surprising insights, such as (i) accurately estimating\n            <jats:italic>fast-forwarding IPC<\/jats:italic>\n            , that is, performance in-between sampling units, is more important than accurately estimating\n            <jats:italic>sample IPC<\/jats:italic>\n            , that is, performance within the sampling units; (ii) fast-forwarding IPC estimation accuracy is determined by both the sampling unit distribution and how to use the sampling units to predict fast-forwarding IPC; and (iii) cantor sampling is more accurate at small sampling unit sizes, whereas periodic is more accurate at large sampling unit sizes.\n          <\/jats:p>\n          <jats:p>\n            These insights lead to the development of\n            <jats:italic>Two-level Hybrid Sampling (THS)<\/jats:italic>\n            , a novel sampling methodology for multithreaded applications that combines periodic sampling\u2019s accuracy at large time scales (i.e., uniformly selecting coarse-grain sampling units across the entire program execution) with cantor sampling\u2019s accuracy at small time scales (i.e., the ability to accurately predict fast-forwarding IPC in-between small sampling units). The clustered occurrence of small sampling units under cantor sampling also enables shortened warmup and thus enhanced simulation speed. Overall, THS achieves an average absolute execution time prediction error of 4% while yielding an average simulation speedup of 40 \u00d7 compared to detailed simulation, which is both more accurate and faster than the current state-of-the-art. Case studies illustrate THS\u2019 ability to accurately predict relative performance differences across the design space.\n          <\/jats:p>","DOI":"10.1145\/2818353","type":"journal-article","created":{"date-parts":[[2015,11,18]],"date-time":"2015-11-18T13:42:28Z","timestamp":1447854148000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Two-Level Hybrid Sampled Simulation of Multithreaded Applications"],"prefix":"10.1145","volume":"12","author":[{"given":"Chuntao","family":"Jiang","sequence":"first","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhibin","family":"Yu","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Advanced Technology, CAS, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lieven","family":"Eeckhout","sequence":"additional","affiliation":[{"name":"Ghent University, Belgium"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hai","family":"Jin","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaofei","family":"Liao","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chengzhong","family":"Xu","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Advanced Technology, China\/Wayne State University, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,11,16]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2006.73"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2013.6522340"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1496909.1496921"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454128"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.675632"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOTS.2012.23"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063454"},{"volume-title":"Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2--12","author":"Carlson T. E.","key":"e_1_2_2_9_1","unstructured":"T. E. Carlson , W. Heirman , and L. Eeckhout . 2013. Sampled simulation of multi-threaded applications . In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2--12 . T. E. Carlson, W. Heirman, and L. Eeckhout. 2013. Sampled simulation of multi-threaded applications. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2--12."},{"volume-title":"Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2--12","author":"Carlson T. E.","key":"e_1_2_2_10_1","unstructured":"T. E. Carlson , W. Heirman , K. Van Craeynest , and L. Eeckhout . 2014. BarrierPoint: Sampled simulation of multi-threaded applications . In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2--12 . T. E. Carlson, W. Heirman, K. Van Craeynest, and L. Eeckhout. 2014. BarrierPoint: Sampled simulation of multi-threaded applications. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2--12."},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2010.47"},{"volume-title":"Proceedings of the International Conference on Computer Design (ICCD). 468--477","author":"Conte T. M.","key":"e_1_2_2_12_1","unstructured":"T. M. Conte , M. A. Hirsch , and K. N. Menezes . 1996. Reducing state loss for effective trace sampling of superscalar processors . In Proceedings of the International Conference on Computer Design (ICCD). 468--477 . T. M. Conte, M. A. Hirsch, and K. N. Menezes. 1996. Reducing state loss for effective trace sampling of superscalar processors. In Proceedings of the International Conference on Computer Design (ICCD). 468--477."},{"volume-title":"Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 220--231","author":"Duesterwald E.","key":"e_1_2_2_13_1","unstructured":"E. Duesterwald , C. Cascaval , and S. Dwarkadas . 2003. Characterizing and predicting program behavior and its variability . In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 220--231 . E. Duesterwald, C. Cascaval, and S. Dwarkadas. 2003. Characterizing and predicting program behavior and its variability. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 220--231."},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2005.1430562"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.982918"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815971"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2541228.2555305"},{"volume-title":"Aggregating performance metrics over a benchmark suite","author":"John L. K.","key":"e_1_2_2_18_1","unstructured":"L. K. John . 2006. Aggregating performance metrics over a benchmark suite . In Performance Evaluation and Benchmarking, L. K. John and L. Eeckhout (Eds.). CRC Press , 47--58. L. K. John. 2006. Aggregating performance metrics over a benchmark suite. In Performance Evaluation and Benchmarking, L. K. John and L. Eeckhout (Eds.). CRC Press, 47--58."},{"volume-title":"Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 1--12","author":"Miller J. E.","key":"e_1_2_2_19_1","unstructured":"J. E. Miller , H. Kasture , G. Kurian , C. Gruenwald , N. Beckmann , C. Celio , J. Eastep , and A. Agarwal . 2010. Graphite: A distributed parallel simulator for multicores . In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 1--12 . J. E. Miller, H. Kasture, G. Kurian, C. Gruenwald, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. 2010. Graphite: A distributed parallel simulator for multicores. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 1--12."},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024724.2024954"},{"key":"e_1_2_2_21_1","doi-asserted-by":"crossref","unstructured":"H. O. Peitgen H. Jurgens and D. Saupe. 2004. Chaos and Fractals: New Frontiers of Science. Springer.   H. O. Peitgen H. Jurgens and D. Saupe. 2004. Chaos and Fractals: New Frontiers of Science. Springer.","DOI":"10.1007\/b97624"},{"volume-title":"Proceedings of the International Conference on Parallel and Distributed Processing Symposium (IPDPS).","author":"Perelman E.","key":"e_1_2_2_22_1","unstructured":"E. Perelman , M. Polito , J.-Y. Bouguet , J. Sampson , B. Calder , and C. Dulong . 2006. Detecting phases in parallel applications on shared memory architectures . In Proceedings of the International Conference on Parallel and Distributed Processing Symposium (IPDPS). E. Perelman, M. Polito, J.-Y. Bouguet, J. Sampson, B. Calder, and C. Dulong. 2006. Detecting phases in parallel applications on shared memory architectures. In Proceedings of the International Conference on Parallel and Distributed Processing Symposium (IPDPS)."},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/166955.166979"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485963"},{"volume-title":"Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 3--14","author":"Sherwood T.","key":"e_1_2_2_25_1","unstructured":"T. Sherwood , E. Perelman , and B. Calder . 2001. Basic block distribution analysis to find periodic behavior and simulation points in applications . In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 3--14 . T. Sherwood, E. Perelman, and B. Calder. 2001. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 3--14."},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605403"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370865"},{"volume-title":"Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 207--217","author":"Uzelac V.","key":"e_1_2_2_28_1","unstructured":"V. Uzelac and A. Milenkovic . 2009. Experiment flows and microbenchmarks for reverse engineering of branch predictor structures . In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 207--217 . V. Uzelac and A. Milenkovic. 2009. Experiment flows and microbenchmarks for reverse engineering of branch predictor structures. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 207--217."},{"volume-title":"Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS). 45--56","author":"Van Biesbrouck M.","key":"e_1_2_2_29_1","unstructured":"M. Van Biesbrouck , T. Sherwood , and B. Calder . 2004. A co-phase matrix to guide simultaneous multithreading simulation . In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS). 45--56 . M. Van Biesbrouck, T. Sherwood, and B. Calder. 2004. A co-phase matrix to guide simultaneous multithreading simulation. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS). 45--56."},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2006.79"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/859618.859629"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2006.404"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2005.8"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOTS.2010.74"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2818353","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2818353","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T05:43:39Z","timestamp":1750225419000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2818353"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,11,16]]},"references-count":34,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2016,1,7]]}},"alternative-id":["10.1145\/2818353"],"URL":"https:\/\/doi.org\/10.1145\/2818353","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2015,11,16]]},"assertion":[{"value":"2015-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-11-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}