{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:16:13Z","timestamp":1750306573956,"version":"3.41.0"},"reference-count":10,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2014,12,3]],"date-time":"2014-12-03T00:00:00Z","timestamp":1417564800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGARCH Comput. Archit. News"],"published-print":{"date-parts":[[2014,12,3]]},"abstract":"<jats:p>In recent years, heterogeneous clusters using accelerators are often used for high performance computing systems. In such clusters, inter-node communication between accelerators requires several memory copies via CPU memory, and the communication latency incurred severely reduces performance. To solve this problem, we have been proposing a Tightly Coupled Accelerators (TCA) architecture intended to reduce the communication latency between accelerators over different nodes. In the TCA architecture, PCI Express packets are used for communication among GPUs over nodes. We developed a communication chip that we call the named PEACH2 chip, to help implement the TCA architecture. In this paper, we describe the details of the design and implementation of the PEACH2 chip, with respect to its routing mechanism and its DMA controller using FPGA. We evaluated the PEACH2 on a new platform that uses the latest Xeon CPU, IvyBridge, and achieved 2.3 GBytes\/sec between GPUs over nodes, while the performance was only 880 MBytes\/sec on the previous platform with SandyBridge.<\/jats:p>","DOI":"10.1145\/2693714.2693716","type":"journal-article","created":{"date-parts":[[2014,12,8]],"date-time":"2014-12-08T16:17:14Z","timestamp":1418055434000},"page":"3-8","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["PEACH2"],"prefix":"10.1145","volume":"42","author":[{"given":"Yuetsu","family":"Kodama","sequence":"first","affiliation":[{"name":"University of Tsukuba, Tsukuba, Ibaraki, Japan"}]},{"given":"Toshihiro","family":"Hanawa","sequence":"additional","affiliation":[{"name":"The University of Tokyo, Kashiwa, Chiba, Japan"}]},{"given":"Taisuke","family":"Boku","sequence":"additional","affiliation":[{"name":"University of Tsukuba, Tsukuba, Ibaraki, Japan"}]},{"given":"Mitsuhisa","family":"Sato","sequence":"additional","affiliation":[{"name":"University of Tsukuba, Tsukuba, Ibaraki, Japan"}]}],"member":"320","published-online":{"date-parts":[[2014,12,3]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2013.226"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/HOTI.2013.15"},{"key":"e_1_2_1_3_1","first-page":"2010","volume":"0","author":"Express Base Specification PCI","unstructured":"PCI Express Base Specification , Rev. 3 . 0 , PCI-SIG, Nov. 2010 . PCI Express Base Specification, Rev. 3.0, PCI-SIG, Nov. 2010.","journal-title":"Rev. 3"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2011.93"},{"key":"e_1_2_1_5_1","unstructured":"Stratix IV Device Handbook Altera Corp. {Online}. Available: http:\/\/www.altera.co.jp\/literature\/lit-stratix-iv.jsp  Stratix IV Device Handbook Altera Corp. {Online}. Available: http:\/\/www.altera.co.jp\/literature\/lit-stratix-iv.jsp"},{"key":"e_1_2_1_6_1","unstructured":"IP Compiler for PCI Express user guide Altera Corp. http:\/\/www.altera.com\/literature\/ug\/ug pci express.pdf.  IP Compiler for PCI Express user guide Altera Corp. http:\/\/www.altera.com\/literature\/ug\/ug pci express.pdf."},{"key":"e_1_2_1_7_1","unstructured":"Developing A Linux Kernel Module Using RDMA For GPUDirect NVIDIA Corp. {Online}. Available: http:\/\/developer.download.nvidia.com\/compute\/cuda\/5 0\/rc\/docs\/GPUDirect RDMA.pdf  Developing A Linux Kernel Module Using RDMA For GPUDirect NVIDIA Corp. {Online}. Available: http:\/\/developer.download.nvidia.com\/compute\/cuda\/5 0\/rc\/docs\/GPUDirect RDMA.pdf"},{"key":"e_1_2_1_8_1","unstructured":"J. Gudmundson \"Enabling multi-host system designs with PCI Express technology \" PLX Technology Inc. May 2004. {Online}. Available: http:\/\/www.plxtech.com\/products\/expresslane\/techinfo  J. Gudmundson \"Enabling multi-host system designs with PCI Express technology \" PLX Technology Inc. May 2004. {Online}. Available: http:\/\/www.plxtech.com\/products\/expresslane\/techinfo"},{"key":"e_1_2_1_9_1","volume-title":"Conference Series","volume":"331","author":"Ammendola R.","year":"2011","unstructured":"R. Ammendola : high bandwidth 3D torus direct network for petaflops scale commodity clusters,\" in Journal of Physics, ser . Conference Series , vol. 331 , Part 5, no. 5 , 2011 . R. Ammendola et al., \"APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters,\" in Journal of Physics, ser. Conference Series, vol. 331, Part 5, no. 5, 2011."},{"volume-title":"version 1.2 (Nov. 20","year":"2013","key":"e_1_2_1_10_1","unstructured":"Specification of XcalableMP , version 1.2 (Nov. 20 . 2013 ). {Online}. Available: http:\/\/www.xcalablemp.org\/ Specification of XcalableMP, version 1.2 (Nov. 20. 2013). {Online}. Available: http:\/\/www.xcalablemp.org\/"}],"container-title":["ACM SIGARCH Computer Architecture News"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2693714.2693716","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2693714.2693716","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:13:31Z","timestamp":1750227211000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2693714.2693716"}},"subtitle":["An FPGA-based PCIe network device for Tightly Coupled Accelerators"],"short-title":[],"issued":{"date-parts":[[2014,12,3]]},"references-count":10,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2014,12,3]]}},"alternative-id":["10.1145\/2693714.2693716"],"URL":"https:\/\/doi.org\/10.1145\/2693714.2693716","relation":{},"ISSN":["0163-5964"],"issn-type":[{"type":"print","value":"0163-5964"}],"subject":[],"published":{"date-parts":[[2014,12,3]]},"assertion":[{"value":"2014-12-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}