{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:53:01Z","timestamp":1750308781091,"version":"3.41.0"},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2009,5,23]],"date-time":"2009-05-23T00:00:00Z","timestamp":1243036800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGARCH Comput. Archit. News"],"published-print":{"date-parts":[[2009,5,23]]},"abstract":"<jats:p>This paper proposes a novel methodology to efficiently simulate shared-memory multiprocessors composed of hundreds of cores. The basic idea is to use thread-level parallelism in the software system and translate it into corelevel parallelism in the simulated world. To achieve this, we first augment an existing full-system simulator to identify and separate the instruction streams belonging to the different software threads. Then, the simulator dynamically maps each instruction flow to the corresponding core of the target multi-core architecture, taking into account the inherent thread synchronization of the running applications. Our simulator allows a user to execute any multithreaded application in a conventional full-system simulator and evaluate the performance of the application on a many-core hardware. We carried out extensive simulations on the SPLASH-2 benchmark suite and demonstrated the scalability up to 1024 cores with limited simulation speed degradation vs. the single-core case on a fixed workload. The results also show that the proposed technique captures the intrinsic behavior of the SPLASH-2 suite, even when we scale up the number of shared-memory cores beyond the thousand-core limit.<\/jats:p>","DOI":"10.1145\/1577129.1577133","type":"journal-article","created":{"date-parts":[[2009,7,28]],"date-time":"2009-07-28T12:43:55Z","timestamp":1248785035000},"page":"10-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":48,"title":["How to simulate 1000 cores"],"prefix":"10.1145","volume":"37","author":[{"given":"Matteo","family":"Monchiero","sequence":"first","affiliation":[{"name":"Hewlett-Packard Laboratories"}]},{"given":"Jung Ho","family":"Ahn","sequence":"additional","affiliation":[{"name":"Hewlett-Packard Laboratories"}]},{"given":"Ayose","family":"Falc\u00f3n","sequence":"additional","affiliation":[{"name":"Hewlett-Packard Laboratories"}]},{"given":"Daniel","family":"Ortega","sequence":"additional","affiliation":[{"name":"Hewlett-Packard Laboratories"}]},{"given":"Paolo","family":"Faraboschi","sequence":"additional","affiliation":[{"name":"Hewlett-Packard Laboratories"}]}],"member":"320","published-online":{"date-parts":[[2009,7,23]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Ambric. Massively Parallel Processor Array technology. http:\/\/www.ambric.com.  Ambric. Massively Parallel Processor Array technology. http:\/\/www.ambric.com."},{"key":"e_1_2_1_2_1","unstructured":"AMD Developer Central. AMD SimNow simulator. http:\/\/developer.amd.com\/simnow.aspx.  AMD Developer Central. AMD SimNow simulator. http:\/\/developer.amd.com\/simnow.aspx."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1496909.1496921"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2008.4523070"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2006.82"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1344671.1344684"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2007.373608"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the Symposium on Architecture for Networking and Communications Systems (ANCS)","author":"Eatherton W.","year":"2005","unstructured":"W. Eatherton . Keynote address : The push of network processing to the top of the pyramid . In Proceedings of the Symposium on Architecture for Networking and Communications Systems (ANCS) , Oct. 2005 . W. Eatherton. Keynote address: The push of network processing to the top of the pyramid. In Proceedings of the Symposium on Architecture for Networking and Communications Systems (ANCS), Oct. 2005."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/52400.52442"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/166962.167001"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2006.41"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1105734.1105739"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MOBS'08)","author":"Jaleel A.","year":"2008","unstructured":"A. Jaleel , R.S. Cohn , C.-K. Luk , and B. Jacob . CMP$im: A Pin-based on-the-fly multi-core cache simulator . In Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MOBS'08) , 2008 . A. Jaleel, R.S. Cohn, C.-K. Luk, and B. Jacob. CMP$im: A Pin-based on-the-fly multi-core cache simulator. In Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MOBS'08), 2008."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/115952.115977"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065010.1065034"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.982916"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1105734.1105747"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/511399.511349"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1127577.1127586"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/166962.166979"},{"key":"e_1_2_1_22_1","volume-title":"January","author":"Renau J.","year":"2005","unstructured":"J. Renau , B. Fraguela , J. Tuck , W. Liu , M. Prvulovic , L. Ceze , S. Sarangi , P. Sack , K. Strauss , and P. Montesinos . SESC simulator , January 2005 . http:\/\/sesc.sourceforge.net. J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator, January 2005. http:\/\/sesc.sourceforge.net."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.1993.274941"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/178243.178260"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the International Solid-State Circuits Conference (ISSCC 2008)","author":"Stackhouse B.","year":"2008","unstructured":"B. Stackhouse . A 65nm 2-billion-transistor quad-core Itanium processor. In Proceedings of the International Solid-State Circuits Conference (ISSCC 2008) , Feb. 2008 . B. Stackhouse. A 65nm 2-billion-transistor quad-core Itanium processor. In Proceedings of the International Solid-State Circuits Conference (ISSCC 2008), Feb. 2008."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2008.4523067"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2007.39"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1216919.1216936"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/223982.223990"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2007.66"}],"container-title":["ACM SIGARCH Computer Architecture News"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1577129.1577133","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1577129.1577133","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T20:22:43Z","timestamp":1750278163000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1577129.1577133"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,5,23]]},"references-count":29,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2009,5,23]]}},"alternative-id":["10.1145\/1577129.1577133"],"URL":"https:\/\/doi.org\/10.1145\/1577129.1577133","relation":{},"ISSN":["0163-5964"],"issn-type":[{"type":"print","value":"0163-5964"}],"subject":[],"published":{"date-parts":[[2009,5,23]]},"assertion":[{"value":"2009-07-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}