{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,20]],"date-time":"2025-08-20T13:04:15Z","timestamp":1755695055385,"version":"3.41.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2021,11,29]],"date-time":"2021-11-29T00:00:00Z","timestamp":1638144000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100002347","name":"Federal Ministry of Education and Research","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"crossref"}]},{"name":"State of North Rhine-Westphalia"},{"name":"Paderborn Center for Parallel Computing"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2022,3,31]]},"abstract":"<jats:p>N-body methods are one of the essential algorithmic building blocks of high-performance and parallel computing. Previous research has shown promising performance for implementing n-body simulations with pairwise force calculations on FPGAs. However, to avoid challenges with accumulation and memory access patterns, the presented designs calculate each pair of forces twice, along with both force sums of the involved particles. Also, they require large problem instances with hundreds of thousands of particles to reach their respective peak performance, limiting the applicability for strong scaling scenarios. This work addresses both issues by presenting a novel FPGA design that uses each calculated force twice and overlaps data transfers and computations in a way that allows to reach peak performance even for small problem instances, outperforming previous single precision results even in double precision, and scaling linearly over multiple interconnected FPGAs. For a comparison across architectures, we provide an equally optimized CPU reference, which for large problems actually achieves higher peak performance per device, however, given the strong scaling advantages of the FPGA design, in parallel setups with few thousand particles per device, the FPGA platform achieves highest performance and power efficiency.<\/jats:p>","DOI":"10.1145\/3491235","type":"journal-article","created":{"date-parts":[[2021,11,29]],"date-time":"2021-11-29T19:21:07Z","timestamp":1638213667000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["The Strong Scaling Advantage of FPGAs in HPC for N-body Simulations"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2414-9809","authenticated-orcid":false,"given":"Johannes","family":"Menzel","sequence":"first","affiliation":[{"name":"Paderborn University, Department of Computer Science and Paderborn Center for Parallel Computing, Paderborn, North Rhine-Westphalia, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5728-9982","authenticated-orcid":false,"given":"Christian","family":"Plessl","sequence":"additional","affiliation":[{"name":"Paderborn University, Department of Computer Science and Paderborn Center for Parallel Computing, Paderborn, North Rhine-Westphalia, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5088-0267","authenticated-orcid":false,"given":"Tobias","family":"Kenter","sequence":"additional","affiliation":[{"name":"Paderborn University, Department of Computer Science and Paderborn Center for Parallel Computing, Paderborn, North Rhine-Westphalia, Germany"}]}],"member":"320","published-online":{"date-parts":[[2021,11,29]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Intel. 2020. Intel\u00ae 64 and IA-32 Architectures Optimization Reference Manual. (May 2020)."},{"key":"e_1_3_2_3_2","volume-title":"The Landscape of Parallel Computing Research: A View from Berkeley","author":"Asanovic Krste","year":"2006","unstructured":"Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf Samuel Webb Williams, and Katherine A. Yelick. 2006. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report. University of California at Berkeley."},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1038\/324446a0"},{"key":"e_1_3_2_5_2","first-page":"8","volume-title":"Proceedings of the International Conference on High-performance Computing","author":"Berczik Peter","year":"2011","unstructured":"Peter Berczik, Keigo Nitadori, Shiyan Zhong, Rainer Spurzem, Tsuyoshi Hamada, Xiaowei Wang, Ingo Berentzen, Alexander Veles, and Wei Ge. 2011. High performance massively parallel direct N-body simulations on large GPU clusters. In Proceedings of the International Conference on High-performance Computing. 8\u201318."},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-38750-0_2"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195647"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1063\/1.464397"},{"key":"e_1_3_2_9_2","first-page":"1","volume-title":"Proceedings of the IEEE International Conference on Application-specific Systems, Architectures, and Processors (ASAP)","author":"Sozzo Emanuele Del","year":"2018","unstructured":"Emanuele Del Sozzo, Marco Rabozzi, Lorenzo Di Tucci, Donatella Sciuto, and Marco D. Santambrogio. 2018. A scalable FPGA design for cloud N-body simulation. In Proceedings of the IEEE International Conference on Application-specific Systems, Architectures, and Processors (ASAP). 1\u20138. DOI: 10.1109\/ASAP.2018.8445106"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1063\/1.470117"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.5555\/3307441.3307446"},{"issue":"075102","key":"e_1_3_2_12_2","article-title":"An analytical approach to computing biomolecular electrostatic potential. II. Validation and applications","volume":"129","author":"Gordon John C.","year":"2008","unstructured":"John C. Gordon, Andrew T. Fenley, and Alexey Onufriev. 2008. An analytical approach to computing biomolecular electrostatic potential. II. Validation and applications. J. Chem. Phys. 129, 075102 (August 2008). DOI:DOI:https:\/\/doi.org\/10.1063\/1.2956499","journal-title":"J. Chem. Phys."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICFPT47387.2019.00020"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/0021-9991(87)90140-9"},{"key":"e_1_3_2_15_2","article-title":"The 2-Body Problem: Higher-order Integrators","author":"Hut Piet","year":"2007","unstructured":"Piet Hut and Jun Makino. 2007. The 2-Body Problem: Higher-order Integrators. Retrieved from http:\/\/www.artcompsci.org\/kali\/vol\/two_body_problem_2\/ch07.html.","journal-title":"Retrieved from http:\/\/www.artcompsci.org\/kali\/vol\/two_body_problem_2\/ch07.html"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3337801.3337813"},{"key":"e_1_3_2_17_2","article-title":"Intel FPGA SDK for OpenCL Pro Edition Best Practices Guide (UG-OCL003 | 2020.12.14, Version 20.4)","year":"2020","unstructured":"Intel. 2020. Intel FPGA SDK for OpenCL Pro Edition Best Practices Guide (UG-OCL003 | 2020.12.14, Version 20.4). Retrieved from https:\/\/www.intel.com\/content\/dam\/www\/programmable\/us\/en\/pdfs\/literature\/hb\/opencl-sdk\/archives\/aocl-best-practices-guide-20-4.pdf.","journal-title":"Retrieved from https:\/\/www.intel.com\/content\/dam\/www\/programmable\/us\/en\/pdfs\/literature\/hb\/opencl-sdk\/archives\/aocl-best-practices-guide-20-4.pdf"},{"key":"e_1_3_2_18_2","article-title":"Intel Stratix 10 Variable Precision DSP Blocks User Guide (UG-S10-DSP | 2020.09.28, Version 20.3)","year":"2020","unstructured":"Intel. 2020. Intel Stratix 10 Variable Precision DSP Blocks User Guide (UG-S10-DSP | 2020.09.28, Version 20.3). Retrieved from https:\/\/www.intel.com\/content\/dam\/www\/programmable\/us\/en\/pdfs\/literature\/hb\/stratix-10\/ug-s10-dsp.pdf.","journal-title":"Retrieved from https:\/\/www.intel.com\/content\/dam\/www\/programmable\/us\/en\/pdfs\/literature\/hb\/stratix-10\/ug-s10-dsp.pdf"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL50879.2020.00031"},{"key":"e_1_3_2_20_2","first-page":"1077","volume-title":"Proceedings of the International Symposium on Parallel and Distributed Processing (IPDPS)","author":"Karp Martin","year":"2021","unstructured":"Martin Karp, Artur Podobas, Niclas Jansson, Tobias Kenter, Christian Plessl, Philipp Schlatter, and Stefano Markidis. 2021. High-performance spectral element methods on field-programmable gate arrays. In Proceedings of the International Symposium on Parallel and Distributed Processing (IPDPS). 1077\u20131086. DOI:DOI:https:\/\/doi.org\/10.1109\/IPDPS49936.2021.00116"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3492805.3492808"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.23919\/FPL.2017.8056844"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3468267.3470617"},{"key":"e_1_3_2_24_2","first-page":"124","article-title":"New features of parallel implementation of N-body problems on GPU","author":"Khrapov S. S.","year":"2018","unstructured":"S. S. Khrapov, S. A. Khoperskov, and A. V. Khoperskov. 2018. New features of parallel implementation of N-body problems on GPU. Bull. South Ural State Univ., Series: Math. Model., Program. Comput. Softw. 1 (2018), 124\u2013136. DOI:DOI:https:\/\/doi.org\/10.14529\/mmp180111","journal-title":"Bull. South Ural State Univ., Series: Math. Model., Program. Comput. Softw."},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00048986"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3149457.3149479"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jctc.0c00744"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11265-015-1051-z"},{"key":"e_1_3_2_29_2","first-page":"141","article-title":"On a Hermite integrator with Ahmad-Cohen scheme for gravitational many-body problems","volume":"44","author":"Makino Junichiro","year":"1992","unstructured":"Junichiro Makino and Sverre J. Aarseth. 1992. On a Hermite integrator with Ahmad-Cohen scheme for gravitational many-body problems. Publicat. Astron. Societ. Japan 44 (1992), 141\u2013151.","journal-title":"Publicat. Astron. Societ. Japan"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.5555\/2388996.2389137"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.5555\/3433701.3433779"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1093\/pasj\/63.4.881"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.5555\/1971951"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1016\/0022-2836(80)90262-4"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/2678373.2665678"},{"issue":"134110","key":"e_1_3_2_36_2","first-page":"1","article-title":"Heterogeneous parallelization and acceleration of molecular dynamics simulations in GROMACS","volume":"153","author":"P\u00e1ll Szil\u00e1rd","year":"2020","unstructured":"Szil\u00e1rd P\u00e1ll, Artem Zhmurov, Paul Bauer, Mark Abraham, Magnus Lundborg, Alan Gray, Berk Hess, and Erik Lindahl. 2020. Heterogeneous parallelization and acceleration of molecular dynamics simulations in GROMACS. J. Chem. Phys. 153, 134110 (2020), 1\u201315. DOI:DOI:https:\/\/doi.org\/10.1063\/5.0018516","journal-title":"J. Chem. Phys."},{"key":"e_1_3_2_37_2","first-page":"285","article-title":"Evaluating the design space for offloading 3D FFT calculations to an FPGA for high-performance computing","volume":"12700","author":"Ramaswami Arjun","year":"2021","unstructured":"Arjun Ramaswami, Tobias Kenter, Thomas D. K\u00fchne, and Christian Plessl. 2021. Evaluating the design space for offloading 3D FFT calculations to an FPGA for high-performance computing. Appl. Reconfig. Comput. Archit. Tools Applic. 12700 (2021), 285\u2013294.","journal-title":"Appl. Reconfig. Comput. Archit. Tools Applic."},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.51"},{"key":"e_1_3_2_39_2","article-title":"The cosmological simulation code GADGET-2","volume":"364","author":"Springel Volker","year":"2005","unstructured":"Volker Springel. 2005. The cosmological simulation code GADGET-2. Month. Not. Roy. Astron. Societ. 364 (2005).","journal-title":"Month. Not. Roy. Astron. Societ."},{"key":"e_1_3_2_40_2","article-title":"An OpenCL 3D FFT for molecular dynamics simulations on multiple FPGAs","author":"Stewart Lawrence C.","year":"2020","unstructured":"Lawrence C. Stewart, Carlo Pasoe, Brian W. Sherman, Martin Herbordt, and Vipin Sachdeva. 2020. An OpenCL 3D FFT for molecular dynamics simulations on multiple FPGAs. arXiv preprint arXiv:2009.12617 (2020).","journal-title":"arXiv preprint arXiv:2009.12617"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.5555\/1025123.1025817"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1093\/mnras\/stv817"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2011.12.013"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356179"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1016\/0375-9601(90)90092-3"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3174243.3174248"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3491235","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3491235","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:19Z","timestamp":1750183759000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3491235"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,29]]},"references-count":45,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,3,31]]}},"alternative-id":["10.1145\/3491235"],"URL":"https:\/\/doi.org\/10.1145\/3491235","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"type":"print","value":"1936-7406"},{"type":"electronic","value":"1936-7414"}],"subject":[],"published":{"date-parts":[[2021,11,29]]},"assertion":[{"value":"2021-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-11-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}