{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T06:50:26Z","timestamp":1761807026893,"version":"3.30.2"},"reference-count":14,"publisher":"World Scientific Pub Co Pte Ltd","issue":"04","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Parallel Process. Lett."],"published-print":{"date-parts":[[2003,12]]},"abstract":"<jats:p>When designing and implementing highly efficient scientific applications for parallel computers such as clusters of workstations, it is inevitable to consider and to optimize the single-CPU performance of the codes. For this purpose, it is particularly important that the codes respect the hierarchical memory designs that computer architects employ in order to hide the effects of the growing gap between CPU performance and main memory speed. In this article, we present techniques to enhance the single-CPU efficiency of lattice Boltzmann methods which are commonly used in computational fluid dynamics. We show various performance results for both 2D and 3D codes in order to emphasize the effectiveness of our optimization techniques.<\/jats:p>","DOI":"10.1142\/s0129626403001501","type":"journal-article","created":{"date-parts":[[2004,3,5]],"date-time":"2004-03-05T11:52:36Z","timestamp":1078487556000},"page":"549-560","source":"Crossref","is-referenced-by-count":86,"title":["OPTIMIZATION AND PROFILING OF THE CACHE PERFORMANCE OF PARALLEL LATTICE BOLTZMANN CODES"],"prefix":"10.1142","volume":"13","author":[{"given":"THOMAS","family":"POHL","sequence":"first","affiliation":[{"name":"System Simulation Group, Dept. of Computer Science, University of Erlangen-Nuremberg, Cauerstra\u00dfe 6, D-91058 Erlangen, Germany"}]},{"given":"MARKUS","family":"KOWARSCHIK","sequence":"additional","affiliation":[{"name":"System Simulation Group, Dept. of Computer Science, University of Erlangen-Nuremberg, Cauerstra\u00dfe 6, D-91058 Erlangen, Germany"}]},{"given":"JENS","family":"WILKE","sequence":"additional","affiliation":[{"name":"System Simulation Group, Dept. of Computer Science, University of Erlangen-Nuremberg, Cauerstra\u00dfe 6, D-91058 Erlangen, Germany"}]},{"given":"KLAUS","family":"IGLBERGER","sequence":"additional","affiliation":[{"name":"System Simulation Group, Dept. of Computer Science, University of Erlangen-Nuremberg, Cauerstra\u00dfe 6, D-91058 Erlangen, Germany"}]},{"given":"ULRICH","family":"R\u00dcDE","sequence":"additional","affiliation":[{"name":"System Simulation Group, Dept. of Computer Science, University of Erlangen-Nuremberg, Cauerstra\u00dfe 6, D-91058 Erlangen, Germany"}]}],"member":"219","published-online":{"date-parts":[[2011,11,21]]},"reference":[{"volume-title":"Optimizing Compilers for Modern Architectures","year":"2001","author":"Allen R.","key":"rf1"},{"key":"rf4","doi-asserted-by":"publisher","DOI":"10.1177\/109434200001400303"},{"key":"rf5","doi-asserted-by":"publisher","DOI":"10.1146\/annurev.fluid.30.1.329"},{"key":"rf6","first-page":"21","volume":"10","author":"Douglas C. C.","journal-title":"Electronic Transactions on Numerical Analysis"},{"key":"rf7","doi-asserted-by":"crossref","unstructured":"M.\u00a0Frigo and S. G.\u00a0Johnson, Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing\u00a03 (Seattle, WA, USA, 1998)\u00a0pp. 1381\u20131384.","DOI":"10.1109\/ICASSP.1998.681704"},{"key":"rf8","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898718218"},{"key":"rf9","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898719703"},{"volume-title":"The Cache Memory Book","year":"1998","author":"Handy J.","key":"rf10"},{"volume-title":"Cache Optimization for the Lattice Boltzmann Method in 3D","year":"2003","author":"Iglberger K.","key":"rf11"},{"key":"rf13","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-36574-5_10"},{"volume-title":"Proc. of the ACM\/IEEE Supercomputing Conf.","year":"1998","author":"Whaley R. C.","key":"rf15"},{"volume-title":"Cache Optimizations for the Lattice Boltzmann Method in 2D","year":"2003","author":"Wilke J.","key":"rf16"},{"key":"rf17","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-45209-6_66"},{"key":"rf18","doi-asserted-by":"publisher","DOI":"10.1007\/b72010"}],"container-title":["Parallel Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0129626403001501","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,15]],"date-time":"2024-12-15T13:59:14Z","timestamp":1734271154000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0129626403001501"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2003,12]]},"references-count":14,"journal-issue":{"issue":"04","published-online":{"date-parts":[[2011,11,21]]},"published-print":{"date-parts":[[2003,12]]}},"alternative-id":["10.1142\/S0129626403001501"],"URL":"https:\/\/doi.org\/10.1142\/s0129626403001501","relation":{},"ISSN":["0129-6264","1793-642X"],"issn-type":[{"type":"print","value":"0129-6264"},{"type":"electronic","value":"1793-642X"}],"subject":[],"published":{"date-parts":[[2003,12]]}}}