{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,28]],"date-time":"2025-06-28T19:40:04Z","timestamp":1751139604769,"version":"3.41.0"},"reference-count":33,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2017,12,13]],"date-time":"2017-12-13T00:00:00Z","timestamp":1513123200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"JSPS KAKENHI","award":["JP16H02794 and JP17J09956"],"award-info":[{"award-number":["JP16H02794 and JP17J09956"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2017,12,31]]},"abstract":"<jats:p>Modeling and simulation\/emulation play a major role in research and development of novel Networks-on-Chip (NoCs). However, conventional software simulators are so slow that studying NoCs for emerging many-core systems with hundreds to thousands of cores is challenging. State-of-the-art FPGA-based NoC emulators have shown great potential in speeding up the NoC simulation, but they cannot emulate large-scale NoCs due to the FPGA capacity constraints. Moreover, emulating large-scale NoCs under synthetic workloads on FPGAs typically requires a large amount of memory and thus involves the use of off-chip memory, which makes the overall design much more complicated and may substantially degrade the emulation speed. This article presents methods for fast and cycle-accurate emulation of NoCs with up to thousands of nodes using a single FPGA. We first describe how to emulate a NoC under a synthetic workload using only FPGA on-chip memory (BRAMs). We next present a novel use of time-division multiplexing where BRAMs are effectively used for emulating a network using a small number of nodes, thereby overcoming the FPGA capacity constraints. We propose methods for emulating both direct and indirect networks, focusing on the commonly used meshes and fat-trees (<jats:italic>k<\/jats:italic>-ary<jats:italic>n<\/jats:italic>-trees). This is different from prior work that considers only direct networks. Using the proposed methods, we build a NoC emulator, called FNoC, and demonstrate the emulation of some mesh-based and fat-tree-based NoCs with canonical router architectures. Our evaluation results show that (1) the size of the largest NoC that can be emulated depends on only the FPGA on-chip memory capacity; (2) a mesh-based NoC with 16,384 nodes (128\u00d7128 NoC) and a fat-tree-based NoC with 6,144 switch nodes and 4,096 terminal nodes (4-ary 6-tree NoC) can be emulated using a single Virtex-7 FPGA; and (3) when emulating these two NoCs, we achieve, respectively, 5,047\u00d7 and 232\u00d7 speedups over BookSim, one of the most widely used software-based NoC simulators, while maintaining the same level of accuracy.<\/jats:p>","DOI":"10.1145\/3151758","type":"journal-article","created":{"date-parts":[[2017,12,13]],"date-time":"2017-12-13T14:50:37Z","timestamp":1513176637000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Fast and Cycle-Accurate Emulation of Large-Scale Networks-on-Chip Using a Single FPGA"],"prefix":"10.1145","volume":"10","author":[{"given":"Thiem Van","family":"Chu","sequence":"first","affiliation":[{"name":"Tokyo Institute of Technology, Tokyo, Japan"}]},{"given":"Shimpei","family":"Sato","sequence":"additional","affiliation":[{"name":"Tokyo Institute of Technology, Tokyo, Japan"}]},{"given":"Kenji","family":"Kise","sequence":"additional","affiliation":[{"name":"Tokyo Institute of Technology, Tokyo, Japan"}]}],"member":"320","published-online":{"date-parts":[[2017,12,13]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2014.04.011"},{"key":"e_1_2_1_2_1","unstructured":"Access IC Lab. 2017. Access Noxim. Retrieved from http:\/\/access.ee.ntu.edu.tw\/noxim\/index.html. Access IC Lab. 2017. Access Noxim. Retrieved from http:\/\/access.ee.ntu.edu.tw\/noxim\/index.html."},{"key":"e_1_2_1_3_1","volume-title":"GARNET: A detailed on-chip network model inside a full-system simulator. In ISPASS. 33--42.","author":"Agarwal N.","year":"2009","unstructured":"N. Agarwal , T. Krishna , L. S. Peh , and N. K. Jha . 2009 . GARNET: A detailed on-chip network model inside a full-system simulator. In ISPASS. 33--42. N. Agarwal, T. Krishna, L. S. Peh, and N. K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In ISPASS. 33--42."},{"key":"e_1_2_1_4_1","doi-asserted-by":"crossref","unstructured":"M. Badr and N. E. Jerger. 2014. SynFull: Synthetic traffic models capturing cache coherent behaviour. In ISCA. 109--120. M. Badr and N. E. Jerger. 2014. SynFull: Synthetic traffic models capturing cache coherent behaviour. In ISCA. 109--120.","DOI":"10.1145\/2678373.2665691"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2953878"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.877831"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2015.35"},{"key":"e_1_2_1_9_1","doi-asserted-by":"crossref","unstructured":"T. V. Chu S. Sato and K. Kise. 2015b. Ultra-fast NoC emulation on a single FPGA. In FPL. 1--8. T. V. Chu S. Sato and K. Kise. 2015b. Ultra-fast NoC emulation on a single FPGA. In FPL. 1--8.","DOI":"10.1109\/FPL.2015.7294021"},{"key":"e_1_2_1_11_1","unstructured":"CMU-SAFARI. 2017. NOCulator. Retreived from https:\/\/github.com\/CMU-SAFARI\/NOCulator. CMU-SAFARI. 2017. NOCulator. Retreived from https:\/\/github.com\/CMU-SAFARI\/NOCulator."},{"key":"e_1_2_1_12_1","unstructured":"W. J. Dally and B. Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers. W. J. Dally and B. Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/996566.996638"},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","unstructured":"N. Jiang D. U. Becker G. Michelogiannakis J. Balfour B. Towles D. E. Shaw J. Kim and W. J. Dally. 2013. A detailed and flexible cycle-accurate network-on-chip simulator. In ISPASS. 86--96. N. Jiang D. U. Becker G. Michelogiannakis J. Balfour B. Towles D. E. Shaw J. Kim and W. J. Dally. 2013. A detailed and flexible cycle-accurate network-on-chip simulator. In ISPASS. 86--96.","DOI":"10.1109\/ISPASS.2013.6557149"},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"H. M. Kamali and S. Hessabi. 2016. AdapNoC: A fast and flexible FPGA-based NoC simulator. In FPL. 1--8. H. M. Kamali and S. Hessabi. 2016. AdapNoC: A fast and flexible FPGA-based NoC simulator. In FPL. 1--8.","DOI":"10.1109\/FPL.2016.7577377"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2012.6189224"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2435264.2435287"},{"key":"e_1_2_1_19_1","volume-title":"The Art of Computer Programming, Volume 2: Seminumerical Algorithms","author":"Knuth D. E.","unstructured":"D. E. Knuth . 1997. The Art of Computer Programming, Volume 2: Seminumerical Algorithms ( 3 rd ed.). Addison-Wesley Longman Publishing Co., Inc. D. E. Knuth. 1997. The Art of Computer Programming, Volume 2: Seminumerical Algorithms (3rd ed.). Addison-Wesley Longman Publishing Co., Inc.","edition":"3"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ReConFig.2008.74"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/VLSID.2011.46"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/MEMCOD.2011.5970513"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1999946.1999969"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024724.2024954"},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","unstructured":"M. Pellauer M. Adler M. Kinsy A. Parashar and J. Emer. 2011. HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing. In HPCA. 406--417. M. Pellauer M. Adler M. Kinsy A. Parashar and J. Emer. 2011. HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing. In HPCA. 406--417.","DOI":"10.1109\/HPCA.2011.5749747"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1344671.1344685"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2012.2184760"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485963"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2015.11"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1837274.1837390"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cam.2016.11.006"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2012.121"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2934495.2949544"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2007.39"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/NOCS.2007.18"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3151758","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3151758","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,28]],"date-time":"2025-06-28T19:09:33Z","timestamp":1751137773000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3151758"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,12,13]]},"references-count":33,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2017,12,31]]}},"alternative-id":["10.1145\/3151758"],"URL":"https:\/\/doi.org\/10.1145\/3151758","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"type":"print","value":"1936-7406"},{"type":"electronic","value":"1936-7414"}],"subject":[],"published":{"date-parts":[[2017,12,13]]},"assertion":[{"value":"2016-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-12-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}