{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T12:14:28Z","timestamp":1763468068799,"version":"3.41.0"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2012,5,1]],"date-time":"2012-05-01T00:00:00Z","timestamp":1335830400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000028","name":"Semiconductor Research Corporation","doi-asserted-by":"publisher","award":["1981"],"award-info":[{"award-number":["1981"]}],"id":[{"id":"10.13039\/100000028","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000144","name":"Division of Computer and Network Systems","doi-asserted-by":"publisher","award":["CNS-0929947CCF-0833136CAREER-0953100OCI-0749285OCI-0749334OCI-1047980CNS-0903447"],"award-info":[{"award-number":["CNS-0929947CCF-0833136CAREER-0953100OCI-0749285OCI-0749334OCI-1047980CNS-0903447"]}],"id":[{"id":"10.13039\/100000144","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000143","name":"Division of Computing and Communication Foundations","doi-asserted-by":"publisher","award":["CNS-0929947CCF-0833136CAREER-0953100OCI-0749285OCI-0749334OCI-1047980CNS-0903447"],"award-info":[{"award-number":["CNS-0929947CCF-0833136CAREER-0953100OCI-0749285OCI-0749334OCI-1047980CNS-0903447"]}],"id":[{"id":"10.13039\/100000143","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000105","name":"Office of Cyberinfrastructure","doi-asserted-by":"publisher","award":["CNS-0929947CCF-0833136CAREER-0953100OCI-0749285OCI-0749334OCI-1047980CNS-0903447"],"award-info":[{"award-number":["CNS-0929947CCF-0833136CAREER-0953100OCI-0749285OCI-0749334OCI-1047980CNS-0903447"]}],"id":[{"id":"10.13039\/100000105","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["ASC-070050NCCR-090024ASC-100019MCA-04N026","CNS-0929947CCF-0833136CAREER-0953100OCI-0749285OCI-0749334OCI-1047980CNS-0903447"],"award-info":[{"award-number":["ASC-070050NCCR-090024ASC-100019MCA-04N026","CNS-0929947CCF-0833136CAREER-0953100OCI-0749285OCI-0749334OCI-1047980CNS-0903447"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000015","name":"U.S. Department of Energy","doi-asserted-by":"publisher","award":["DEFC02-10ER26006\/DE-SC0004915"],"award-info":[{"award-number":["DEFC02-10ER26006\/DE-SC0004915"]}],"id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Commun. ACM"],"published-print":{"date-parts":[[2012,5]]},"abstract":"<jats:p>We describe a parallel fast multipole method (FMM) for highly nonuniform distributions of particles. We employ both distributed memory parallelism (via MPI) and shared memory parallelism (via OpenMP and GPU acceleration) to rapidly evaluate two-body nonoscillatory potentials in three dimensions on heterogeneous high performance computing architectures. We have performed scalability tests with up to 30 billion particles on 196,608 cores on the AMD\/CRAY-based Jaguar system at ORNL. On a GPU-enabled system (NSF's Keeneland at Georgia Tech\/ORNL), we observed 30\u00d7 speedup over a single core CPU and 7\u00d7 speedup over a multicore CPU implementation. By combining GPUs with MPI, we achieve less than 10 ns\/particle and six digits of accuracy for a run with 48 million nonuniformly distributed particles on 192 GPUs.<\/jats:p>","DOI":"10.1145\/2160718.2160740","type":"journal-article","created":{"date-parts":[[2012,4,24]],"date-time":"2012-04-24T18:41:10Z","timestamp":1335292870000},"page":"101-109","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":57,"title":["A massively parallel adaptive fast multipole method on heterogeneous architectures"],"prefix":"10.1145","volume":"55","author":[{"given":"Ilya","family":"Lashuk","sequence":"first","affiliation":[{"name":"Lawrence Livermore National Laboratory, Livermore, CA"}]},{"given":"Aparna","family":"Chandramowlishwaran","sequence":"additional","affiliation":[{"name":"College of Computing, Atlanta, GA"}]},{"given":"Harper","family":"Langston","sequence":"additional","affiliation":[{"name":"College of Computing, Atlanta, GA"}]},{"given":"Tuan-Anh","family":"Nguyen","sequence":"additional","affiliation":[{"name":"College of Computing, Atlanta, GA"}]},{"given":"Rahul","family":"Sampath","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, TN"}]},{"given":"Aashay","family":"Shringarpure","sequence":"additional","affiliation":[]},{"given":"Richard","family":"Vuduc","sequence":"additional","affiliation":[{"name":"College of Computing, Atlanta, GA"}]},{"given":"Lexing","family":"Ying","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, TX"}]},{"given":"Denis","family":"Zorin","sequence":"additional","affiliation":[{"name":"New York University, New York, NY"}]},{"given":"George","family":"Biros","sequence":"additional","affiliation":[{"name":"The University of Texas at Austin, TX"}]}],"member":"320","published-online":{"date-parts":[[2012,5]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"A hirerachical O(N logN) force-calculation algorithm. Nature 324, 4 (December","author":"Barnes J.","year":"1986","unstructured":"Barnes , J. , Hut , P. A hirerachical O(N logN) force-calculation algorithm. Nature 324, 4 (December 1986 ), 446--449. Barnes, J., Hut, P. A hirerachical O(N logN) force-calculation algorithm. Nature 324, 4 (December 1986), 446--449."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/313651.313705"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1137\/0909044"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827500367609"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.crme.2010.12.005"},{"key":"e_1_2_1_6_1","volume-title":"An Introduction to Parallel Computing: Design and Analysis of Algorithms","author":"Grama A.","year":"2003","unstructured":"Grama , A. , Gupta , A. , Karypis , G. , Kumar , V. An Introduction to Parallel Computing: Design and Analysis of Algorithms , 2 nd edn, Addison Wesley , 2003 . Grama, A., Gupta, A., Karypis, G., Kumar, V. An Introduction to Parallel Computing: Design and Analysis of Algorithms, 2nd edn, Addison Wesley, 2003.","edition":"2"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/602770.602846"},{"key":"e_1_2_1_8_1","unstructured":"Gray A. Moore A. N-Body' problems in statistical learning. Adv. Neural Inform. Process Syst. (2001) 521--527.  Gray A. Moore A. N-Body' problems in statistical learning. Adv. Neural Inform. Process Syst . (2001) 521--527."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/0898-1221(90)90349-O"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/0021-9991(87)90140-9"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2008.05.023"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654123"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2004.12.007"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/133889"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2005.02.001"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.265.5174.909"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654118"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/256292.256294"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0010-4655(03)00246-7"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/1413370.1413379"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.42"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/0021-9991(85)90002-6"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/846234.849337"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1137\/070681727"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827595288942"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1051\/0004-6361\/200810657"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/169627.169640"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2003.11.021"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1048935.1050165"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2011.02.013"}],"container-title":["Communications of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2160718.2160740","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2160718.2160740","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:06:40Z","timestamp":1750241200000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2160718.2160740"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,5]]},"references-count":30,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2012,5]]}},"alternative-id":["10.1145\/2160718.2160740"],"URL":"https:\/\/doi.org\/10.1145\/2160718.2160740","relation":{},"ISSN":["0001-0782","1557-7317"],"issn-type":[{"type":"print","value":"0001-0782"},{"type":"electronic","value":"1557-7317"}],"subject":[],"published":{"date-parts":[[2012,5]]},"assertion":[{"value":"2012-05-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}