{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,27]],"date-time":"2025-03-27T08:57:20Z","timestamp":1743065840612,"version":"3.38.0"},"reference-count":30,"publisher":"SAGE Publications","issue":"5","license":[{"start":{"date-parts":[[2020,1,3]],"date-time":"2020-01-03T00:00:00Z","timestamp":1578009600000},"content-version":"vor","delay-in-days":365,"URL":"http:\/\/www.sagepub.com\/licence-information-for-chorus"}],"funder":[{"DOI":"10.13039\/501100003382","name":"Core Research for Evolutional Science and Technology","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003382","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2019,9]]},"abstract":"<jats:p> Accelerated clusters, which are cluster systems equipped with accelerators, are one of the most common systems in parallel computing. In order to exploit the performance of such systems, it is important to reduce communication latency between accelerator memories. In addition, there is also a need for a programming language that facilitates the maintenance of high performance by such systems. The goal of the present article is to evaluate XcalableACC (XACC), a parallel programming language, with tightly coupled accelerators\/InfiniBand (TCAs\/IB) hybrid communication on an accelerated cluster. TCA\/IB hybrid communication is a combination of low-latency communication with TCA and high bandwidth with IB. The XACC language, which is a directive-based language for accelerated clusters, enables programmers to use TCA\/IB hybrid communication with ease. In order to evaluate the performance of XACC with TCA\/IB hybrid communication, we implemented the lattice quantum chromodynamics (LQCD) mini-application and evaluated the application on our accelerated cluster using up to 64 compute nodes. We also implemented the LQCD mini-application using a combination of CUDA and MPI (CUDA + MPI) and that of OpenACC and MPI (OpenACC + MPI) for comparison with XACC. Performance evaluation revealed that the performance of XACC with TCA\/IB hybrid communication is 9% better than that of CUDA + MPI and 18% better than that of OpenACC + MPI. Furthermore, the performance of XACC was found to further increase by 7% by new expansion to XACC. Productivity evaluation revealed that XACC requires much less change from the serial LQCD code to implement the parallel LQCD code than CUDA + MPI and OpenACC + MPI. Moreover, since XACC can perform parallelization while maintaining the sequential code image, XACC is highly readable and shows excellent portability due to its directive-based approach. <\/jats:p>","DOI":"10.1177\/1094342018821163","type":"journal-article","created":{"date-parts":[[2019,1,4]],"date-time":"2019-01-04T06:31:52Z","timestamp":1546583512000},"page":"869-884","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":2,"title":["Evaluation of XcalableACC with tightly coupled accelerators\/InfiniBand hybrid communication on accelerated cluster"],"prefix":"10.1177","volume":"33","author":[{"given":"Masahiro","family":"Nakao","sequence":"first","affiliation":[{"name":"RIKEN Center for Computational Science, Kobe, Japan"}]},{"given":"Tetsuya","family":"Odajima","sequence":"additional","affiliation":[{"name":"RIKEN Center for Computational Science, Kobe, Japan"}]},{"given":"Hitoshi","family":"Murai","sequence":"additional","affiliation":[{"name":"RIKEN Center for Computational Science, Kobe, Japan"}]},{"given":"Akihiro","family":"Tabuchi","sequence":"additional","affiliation":[{"name":"Fujitsu Laboratories Ltd, Kawasaki, Japan"}]},{"given":"Norihisa","family":"Fujita","sequence":"additional","affiliation":[{"name":"Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan"}]},{"given":"Toshihiro","family":"Hanawa","sequence":"additional","affiliation":[{"name":"Information Technology Center, The University of Tokyo, Tokyo, Japan"}]},{"given":"Taisuke","family":"Boku","sequence":"additional","affiliation":[{"name":"Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan"},{"name":"Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan"}]},{"given":"Mitsuhisa","family":"Sato","sequence":"additional","affiliation":[{"name":"RIKEN Center for Computational Science, Kobe, Japan"}]}],"member":"179","published-online":{"date-parts":[[2019,1,3]]},"reference":[{"key":"bibr1-1094342018821163","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/331\/5\/052029"},{"key":"bibr2-1094342018821163","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2013.128"},{"key":"bibr3-1094342018821163","unstructured":"Bridge++ (2017) Available at: http:\/\/bridge.kek.jpLattice-codeindex_e.html (accessed 26 March 2018)."},{"key":"bibr4-1094342018821163","first-page":"8:1","volume-title":"Proceedings of the 2011 ACM SIGPLAN X10 workshop, X10\u201911","author":"Dave C","year":"2011"},{"key":"bibr5-1094342018821163","doi-asserted-by":"publisher","DOI":"10.1109\/XSW.2013.7"},{"key":"bibr6-1094342018821163","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2013.226"},{"key":"bibr7-1094342018821163","doi-asserted-by":"publisher","DOI":"10.2172\/1169830"},{"key":"bibr8-1094342018821163","doi-asserted-by":"publisher","DOI":"10.1145\/2693714.2693716"},{"key":"bibr9-1094342018821163","doi-asserted-by":"publisher","DOI":"10.1109\/CANDAR.2014.44"},{"key":"bibr10-1094342018821163","unstructured":"Matsufuru H (2009) Available at: http:\/\/research.kek.jppeoplematufuruResearchProgramsTuning_CppSolv_Wilson_Cpp (accessed 26 March 2018)."},{"key":"bibr11-1094342018821163","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2015.102"},{"key":"bibr12-1094342018821163","first-page":"67:1","volume-title":"Proceedings of the international conference on high performance computing, networking, storage and analysis, SC12","author":"Michael G","year":"2012"},{"key":"bibr13-1094342018821163","unstructured":"MVAPICH (2018) Available at: http:\/\/mvapich.cse.ohio-state.edu (accessed 26 March 2018)."},{"key":"bibr14-1094342018821163","doi-asserted-by":"publisher","DOI":"10.1109\/WACCPD.2014.6"},{"key":"bibr15-1094342018821163","doi-asserted-by":"publisher","DOI":"10.1177\/1094342017698214"},{"key":"bibr16-1094342018821163","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2017.58"},{"key":"bibr17-1094342018821163","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2015.112"},{"key":"bibr18-1094342018821163","doi-asserted-by":"publisher","DOI":"10.1177\/1094342015626584"},{"key":"bibr19-1094342018821163","unstructured":"Omni Compiler (2018) Available at: http:\/\/omni-compiler.org (accessed 26 March 2018)."},{"key":"bibr20-1094342018821163","unstructured":"PGI-SIG (2017) PCI Express External Cabling Specification Revision 1.0."},{"key":"bibr21-1094342018821163","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2012.60"},{"volume-title":"Proceedings of the fifth conference on partitioned global address space programming models","year":"2011","author":"Stone AI","key":"bibr22-1094342018821163"},{"key":"bibr23-1094342018821163","unstructured":"Stratix IV (2014) Altera Corp. Available at: http:\/\/www.altera.co.jp (accessed 26 March 2018)."},{"volume-title":"Proceedings of the second workshop on accelerator programming using directives, WACCPD \u201815","year":"2015","author":"Stu B","key":"bibr24-1094342018821163"},{"key":"bibr25-1094342018821163","doi-asserted-by":"crossref","unstructured":"Tabuchi A, Nakao M, Sato M (2013) A source-to-source OpenACC compiler for CUDA. In: Euro-Par workshops, Aachen, Germany, 26\u201330 August 2013, pp. 178\u2013187.","DOI":"10.1007\/978-3-642-54420-0_18"},{"key":"bibr26-1094342018821163","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2017.81"},{"key":"bibr27-1094342018821163","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevD.10.2445"},{"key":"bibr28-1094342018821163","unstructured":"XcalableACC Specification (2018) Available at: http:\/\/xcalablemp.orgXACC.html (accessed 26 March 2018)."},{"key":"bibr29-1094342018821163","unstructured":"XcalableMP Specification (2017). Available at: http:\/\/xcalablemp.org (accessed 26 March 2018)."},{"key":"bibr30-1094342018821163","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2016.50"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342018821163","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342018821163","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342018821163","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342018821163","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T23:01:40Z","timestamp":1740870100000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342018821163"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,3]]},"references-count":30,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2019,9]]}},"alternative-id":["10.1177\/1094342018821163"],"URL":"https:\/\/doi.org\/10.1177\/1094342018821163","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2019,1,3]]}}}