{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T02:42:28Z","timestamp":1775788948810,"version":"3.50.1"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2018,3,22]],"date-time":"2018-03-22T00:00:00Z","timestamp":1521676800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation, including CAREER","award":["CCF-1350624 and ECCS-1232164"],"award-info":[{"award-number":["CCF-1350624 and ECCS-1232164"]}]},{"name":"NSF Graduate Research Fellowship","award":["1002809"],"award-info":[{"award-number":["1002809"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2018,3,31]]},"abstract":"<jats:p>Trace-driven simulation of chip multiprocessor (CMP) systems offers many advantages over execution-driven simulation, such as reducing simulation time and complexity, allowing portability, and scalability. However, trace-based simulation approaches have difficulty capturing and accurately replaying multithreaded traces due to the inherent nondeterminism in the execution of multithreaded programs. In this work, we present SynchroTrace, a scalable, flexible, and accurate trace-based multithreaded simulation methodology. By recording synchronization events relevant to modern threading libraries (e.g., Pthreads and OpenMP) and dependencies in the traces, independent of the host architecture, the methodology is able to accurately model the nondeterminism of multithreaded programs for different hardware platforms and threading paradigms. Through capturing high-level instruction categories, the SynchroTrace average CPI trace Replay timing model offers fast and accurate simulation of many-core in-order CMPs. We perform two case studies to validate the SynchroTrace simulation flow against the gem5 full-system simulator: (1) a constraint-based design space exploration with traditional CMP benchmarks and (2) a thread-scalability study with HPC-representative applications. The results from these case studies show that (1) our trace-based approach with trace filtering has a peak speedup of up to 18.7\u00d7 over simulation in gem5 full-system with an average of 9.6\u00d7 speedup, (2) SynchroTrace maintains the thread-scaling accuracy of gem5 and can efficiently scale up to 64 threads, and (3) SynchroTrace can trace in one platform and model any platform in early stages of design.<\/jats:p>","DOI":"10.1145\/3158642","type":"journal-article","created":{"date-parts":[[2018,3,23]],"date-time":"2018-03-23T12:29:49Z","timestamp":1521808189000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["SynchroTrace"],"prefix":"10.1145","volume":"15","author":[{"given":"Karthik","family":"Sangaiah","sequence":"first","affiliation":[{"name":"Drexel University, Philadelphia, PA"}]},{"given":"Michael","family":"Lui","sequence":"additional","affiliation":[{"name":"Drexel University, Philadelphia, PA"}]},{"given":"Radhika","family":"Jagtap","sequence":"additional","affiliation":[{"name":"ARM Ltd., Cambridge, UK"}]},{"given":"Stephan","family":"Diestelhorst","sequence":"additional","affiliation":[{"name":"ARM Ltd., Cambridge, UK"}]},{"given":"Siddharth","family":"Nilakantan","sequence":"additional","affiliation":[{"name":"NVIDIA Corporation"}]},{"given":"Ankit","family":"More","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Baris","family":"Taskin","sequence":"additional","affiliation":[{"name":"Drexel University, Philadelphia, PA"}]},{"given":"Mark","family":"Hempstead","sequence":"additional","affiliation":[{"name":"Tufts University, Medford, MA"}]}],"member":"320","published-online":{"date-parts":[[2018,3,22]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems. 617--662","author":"Beyls K.","unstructured":"K. Beyls and E. D\u2019Hollander . 2001. Reuse distance as a metric for cache behavior . In Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems. 617--662 . K. Beyls and E. D\u2019Hollander. 2001. Reuse distance as a metric for cache behavior. In Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems. 617--662."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_2_1_4_1","volume-title":"Technical Report MSU-CSE-99-31. Department of Computer Science and Engineering","author":"Brehob M.","year":"1999","unstructured":"M. Brehob and R. Enbody . 1999 . An Analytical Model of Locality and Caching . Technical Report MSU-CSE-99-31. Department of Computer Science and Engineering , Michigan State University . M. Brehob and R. Enbody. 1999. An Analytical Model of Locality and Caching. Technical Report MSU-CSE-99-31. Department of Computer Science and Engineering, Michigan State University."},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201914)","author":"Carlson T. E.","unstructured":"T. E. Carlson , W. Heirman , K. V. Craeynest , and L. Eeckhout . 2014. Barrierpoint: Sampled simulation of multi-threaded applications . In Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201914) . 2--12. T. E. Carlson, W. Heirman, K. V. Craeynest, and L. Eeckhout. 2014. Barrierpoint: Sampled simulation of multi-threaded applications. In Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201914). 2--12."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063454"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/Co-HPC.2014.6"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2014.2349522"},{"key":"e_1_2_1_9_1","volume-title":"n.d. Home Page. Retrieved","author":"Initiative Exascale","year":"2018","unstructured":"Exascale Initiative . n.d. Home Page. Retrieved February 13, 2018 , from http:\/\/www.exascaleinitiative.org\/. Exascale Initiative. n.d. Home Page. Retrieved February 13, 2018, from http:\/\/www.exascaleinitiative.org\/."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/BDC.2014.16"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS\u201910)","author":"Ganesan K.","unstructured":"K. Ganesan , J. Jo , and L. K. John . 2010. Synthesizing memory-level parallelism aware miniature clones for SPEC CPU2006 and ImplantBench workloads . In Proceedings of the 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS\u201910) . 33--44. K. Ganesan, J. Jo, and L. K. John. 2010. Synthesizing memory-level parallelism aware miniature clones for SPEC CPU2006 and ImplantBench workloads. In Proceedings of the 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS\u201910). 33--44."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2013.36"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/166962.167001"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1921249.1921258"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2228360.2228428"},{"key":"e_1_2_1_16_1","volume-title":"n.d. Intel Xeon E5-2667. Retrieved","year":"2018","unstructured":"Intel. n.d. Intel Xeon E5-2667. Retrieved February 13, 2018 , from http:\/\/ark.intel.com\/products\/83361. Intel. n.d. Intel Xeon E5-2667. Retrieved February 13, 2018, from http:\/\/ark.intel.com\/products\/83361."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 2016 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS\u201916)","author":"Jagtap R.","unstructured":"R. Jagtap , S. Diestelhorst , A. Hansoon , M. Jung , and N. When . 2016. Exploring system performance using Elastic Traces: Fast, accurate and portable . In Proceedings of the 2016 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS\u201916) . 96--105. R. Jagtap, S. Diestelhorst, A. Hansoon, M. Jung, and N. When. 2016. Exploring system performance using Elastic Traces: Fast, accurate and portable. In Proceedings of the 2016 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS\u201916). 96--105."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/LES.2015.2402197"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2013.115"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201915)","author":"Kestor G.","unstructured":"G. Kestor , R. Gioiosa , and D. Chavarria-Miranda . 2015. Prometheus: Scalable and accurate emulation of task-based applications on many-core systems . In Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201915) . 308--317. G. Kestor, R. Gioiosa, and D. Chavarria-Miranda. 2015. Prometheus: Scalable and accurate emulation of task-based applications on many-core systems. In Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201915). 308--317."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2015.2414456"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.13182\/NSE16-33"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201910)","author":"Miller J. E.","unstructured":"J. E. Miller , H. Kasture , G. Kurian , C. Gruenwald , N. Beckmann , C. Celio , J. Eastep , and A. Agarwal . 2010. Graphite: A distributed parallel simulator for multicores . In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201910) . J. E. Miller, H. Kasture, G. Kurian, C. Gruenwald, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. 2010. Graphite: A distributed parallel simulator for multicores. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201910)."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2007.30"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1254810.1254820"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273442.1250746"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 2013 IEEE International Symposium on Workload Characterization (IISWC\u201913)","author":"Nilakantan S.","unstructured":"S. Nilakantan and M. Hempstead . 2013. Platform-independent analysis of function-level communication in workloads . In Proceedings of the 2013 IEEE International Symposium on Workload Characterization (IISWC\u201913) . S. Nilakantan and M. Hempstead. 2013. Platform-independent analysis of function-level communication in workloads. In Proceedings of the 2013 IEEE International Symposium on Workload Characterization (IISWC\u201913)."},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the International Conference on VLSI Design and the 14th International Conference on Embedded System Design (VLSID ES\u201915)","author":"Nilakantan S.","unstructured":"S. Nilakantan , S. Lerner , M. Hempstead , and B. Taskin . 2015. Can you trust your memory trace? A comparison of memory traces from binary instrumentation and simulation . In Proceedings of the International Conference on VLSI Design and the 14th International Conference on Embedded System Design (VLSID ES\u201915) . S. Nilakantan, S. Lerner, M. Hempstead, and B. Taskin. 2015. Can you trust your memory trace? A comparison of memory traces from binary instrumentation and simulation. In Proceedings of the International Conference on VLSI Design and the 14th International Conference on Embedded System Design (VLSID ES\u201915)."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1999946.1999971"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772954.1772958"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/2015039.2015523"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485963"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the International Conference on High Performance Computing and Simulation (HPCS\u201911)","author":"Trivino F.","unstructured":"F. Trivino , F. J. Andujar , F. J. Alfaro , and J. L. Sanchez . 2011. Self-related traces: An alternative to full-system simulation for NoCs . In Proceedings of the International Conference on High Performance Computing and Simulation (HPCS\u201911) . F. Trivino, F. J. Andujar, F. J. Alfaro, and J. L. Sanchez. 2011. Self-related traces: An alternative to full-system simulation for NoCs. In Proceedings of the International Conference on High Performance Computing and Simulation (HPCS\u201911)."},{"key":"e_1_2_1_35_1","volume-title":"Retrieved","author":"Function-Wrapping Valgrind","year":"2017","unstructured":"Valgrind Function-Wrapping . 2017 . Home Page . Retrieved February 13, 2018, from http:\/\/valgrind.org\/docs\/manual\/manual-core-adv.html#manual-core-adv.wrapping. Valgrind Function-Wrapping. 2017. Home Page. Retrieved February 13, 2018, from http:\/\/valgrind.org\/docs\/manual\/manual-core-adv.html#manual-core-adv.wrapping."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2581122.2544152"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/223982.223990"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/301177.301496"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3158642","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3158642","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:26:10Z","timestamp":1750213570000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3158642"}},"subtitle":["Synchronization-Aware Architecture-Agnostic Traces for Lightweight Multicore Simulation of CMP and HPC Workloads"],"short-title":[],"issued":{"date-parts":[[2018,3,22]]},"references-count":36,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2018,3,31]]}},"alternative-id":["10.1145\/3158642"],"URL":"https:\/\/doi.org\/10.1145\/3158642","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,3,22]]},"assertion":[{"value":"2017-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-03-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}