{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,23]],"date-time":"2026-02-23T01:18:58Z","timestamp":1771809538068,"version":"3.50.1"},"reference-count":25,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2013,2,1]],"date-time":"2013-02-01T00:00:00Z","timestamp":1359676800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000105","name":"Office of Cyberinfrastructure","doi-asserted-by":"publisher","award":["OCI-0850680, OCI-0850750"],"award-info":[{"award-number":["OCI-0850680, OCI-0850750"]}],"id":[{"id":"10.13039\/100000105","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Math. Softw."],"published-print":{"date-parts":[[2013,2]]},"abstract":"<jats:p>We present a novel finite element integration method for low-order elements on GPUs. We achieve more than 100GF for element integration on first order discretizations of both the Laplacian and Elasticity operators on an NVIDIA GTX285, which has a nominal single precision peak flop rate of 1 TF\/s and bandwidth of 159 GB\/s, corresponding to a bandwidth limited peak of 40 GF\/s.<\/jats:p>","DOI":"10.1145\/2427023.2427027","type":"journal-article","created":{"date-parts":[[2013,2,22]],"date-time":"2013-02-22T19:25:04Z","timestamp":1361561104000},"page":"1-13","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":31,"title":["Finite Element Integration on GPUs"],"prefix":"10.1145","volume":"39","author":[{"given":"Matthew G.","family":"Knepley","sequence":"first","affiliation":[{"name":"University of Chicago"}]},{"given":"Andy R.","family":"Terrel","sequence":"additional","affiliation":[{"name":"University of Texas at Austin"}]}],"member":"320","published-online":{"date-parts":[[2013,2]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Aln\u00e6s M. S. and Logg A. 2009. UFL Specification and User Manual. Simula Research. https:\/\/launchpad.net\/ufl.  Aln\u00e6s M. S. and Logg A. 2009. UFL Specification and User Manual. Simula Research. https:\/\/launchpad.net\/ufl."},{"key":"e_1_2_1_2_1","unstructured":"Bayer M. 2010. The Mako templating system. http:\/\/www.makotemplates.org\/.  Bayer M. 2010. The Mako templating system. http:\/\/www.makotemplates.org\/."},{"key":"e_1_2_1_3_1","unstructured":"Bell N. and Garland M. 2010. The Cusp library. http:\/\/code.google.com\/p\/cusp-library\/.  Bell N. and Garland M. 2010. The Cusp library. http:\/\/code.google.com\/p\/cusp-library\/."},{"key":"e_1_2_1_4_1","unstructured":"Bell N. and Hoberock J. 2010. The Thrust library. http:\/\/code.google.com\/p\/thrust\/.  Bell N. and Hoberock J. 2010. The Thrust library. http:\/\/code.google.com\/p\/thrust\/."},{"key":"e_1_2_1_5_1","unstructured":"Brown J. 2011. Private communication with code sample.  Brown J. 2011. Private communication with code sample."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1002\/nme.2989"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2009.144"},{"key":"e_1_2_1_8_1","first-page":"1","article-title":"Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the CM\/IEEE Conference on Supercomputing (SC\u201908)","volume":"4","author":"Datta K.","year":"2008","journal-title":"IEEE"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1163641.1163644"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1137\/040607824"},{"key":"e_1_2_1_11_1","unstructured":"Kl\u00f6ckner A. 2011. Loo.py. unpublished loop slicing tool.  Kl\u00f6ckner A. 2011. Loo.py. unpublished loop slicing tool."},{"key":"e_1_2_1_12_1","unstructured":"Kl\u00f6ckner A. Pinto N. Lee Y. Catanzaro B. Ivanov P. and Fasih A. 2009. PyCUDA: GPU run-time code generation for high-performance computing. http:\/\/arxiv.org\/abs\/0911.3456v1.  Kl\u00f6ckner A. Pinto N. Lee Y. Catanzaro B. Ivanov P. and Fasih A. 2009. PyCUDA: GPU run-time code generation for high-performance computing. http:\/\/arxiv.org\/abs\/0911.3456v1."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2009.06.041"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2009.01.006"},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Logg A. Mardal K.-A. and Wells G. N. 2012. Automated solution of differential equations by the finite element method: The fenics book. http:\/\/fenicsproject.org\/book\/.   Logg A. Mardal K.-A. and Wells G. N. 2012. Automated solution of differential equations by the finite element method: The fenics book. http:\/\/fenicsproject.org\/book\/.","DOI":"10.1007\/978-3-642-23099-8"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the International Conference of Numerical Analysis and Applied Mathematics. American Institute of Physics Conference Series","volume":"1281","author":"Markall G."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the International Parallel and Distributed Processing Symposium. IEEE, 1--12","author":"Maruyama N."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the IEEE International Symposium on Parallel & Distributed Processing. IEEE, 1--11","author":"Murthy G."},{"key":"e_1_2_1_19_1","volume-title":"NVIDIA CUDA Compute Unified Device Architecture Programming Guide","author":"NVIDIA Corporation"},{"key":"e_1_2_1_20_1","volume-title":"NVIDIA CUBLAS User Guide","author":"NVIDIA Corporation"},{"key":"e_1_2_1_21_1","volume-title":"NVIDIA CUSPARSE User Guide","author":"NVIDIA Corporation"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1137\/070710032"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1345206.1345220"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMI.2007.913112"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.v22:16"}],"container-title":["ACM Transactions on Mathematical Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2427023.2427027","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2427023.2427027","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T08:49:00Z","timestamp":1750236540000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2427023.2427027"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,2]]},"references-count":25,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2013,2]]}},"alternative-id":["10.1145\/2427023.2427027"],"URL":"https:\/\/doi.org\/10.1145\/2427023.2427027","relation":{},"ISSN":["0098-3500","1557-7295"],"issn-type":[{"value":"0098-3500","type":"print"},{"value":"1557-7295","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,2]]},"assertion":[{"value":"2011-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-02-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}