{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T23:00:36Z","timestamp":1777676436049,"version":"3.51.4"},"reference-count":24,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2010,1,11]],"date-time":"2010-01-11T00:00:00Z","timestamp":1263168000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2010,8]]},"abstract":"<jats:p>With the current shift of increasing the computational power of a processor by including multiple cores instead of increasing the clock frequency, consideration of computational efficiency is gaining increased importance for computational fluid dynamics codes. This is especially critical for applications that require high throughput. For example, applying computational fluid dynamics simulations to multi-disciplinary design optimization requires a large number of similar simulations with different input parameters. Therefore, a reduction in the runtime of the code can lead to large reduction in the design process. In our case study, a two-dimensional, block-structured computational fluid dynamics code was optimized for performance on machines with hierarchical memory systems. This paper illustrates the techniques applied to transform an initial version of the code to an optimized version that yielded performance improvements of 10% for very small cases to about 50% for large test cases that did not fit into the cache memory of the target processor. A detailed performance analysis of the code starting at the global level down to subroutines and data structures is presented in this paper. The performance improvements can be explained through a reduction of cache misses in all levels of the memory hierarchy. The L1 cache misses were reduced by about 50%, the L2 cache misses by about 80% and the translation lookaside buffer misses by about 90% for the optimized version of the code. The code performance was also evaluated for multi-core processors, where efficiency is especially important when several instances of an application are running simultaneously. In this case, the most optimized version, a blocked version of the optimized code, more effectively maintained efficiency as more cores were activated compared to the unblocked version. This illustrates that optimizing cache performance may be increasingly important as the number of cores per processor continues to rise.<\/jats:p>","DOI":"10.1177\/1094342009358413","type":"journal-article","created":{"date-parts":[[2010,1,11]],"date-time":"2010-01-11T21:00:51Z","timestamp":1263243651000},"page":"299-318","source":"Crossref","is-referenced-by-count":2,"title":["Optimization of a Computational Fluid Dynamics Code for the Memory Hierarchy: A Case Study"],"prefix":"10.1177","volume":"24","author":[{"given":"Thomas","family":"Hauser","sequence":"first","affiliation":[{"name":"ACADEMIC & RESEARCH TECHNOLOGIES, NORTHWESTERN UNIVERSITY, 1970 CAMPUS DRIVE, EVANSTON, IL, 60208, USA,"}]},{"given":"Raymond","family":"LeBeau","sequence":"additional","affiliation":[{"name":"PHYSICS AND ASTRONOMY DEPARTMENT, UNIVERSITY OF KENTUCKY, LEXINGTON, KY, USA"}]}],"member":"179","published-online":{"date-parts":[[2010,1,11]]},"reference":[{"key":"atypb1","volume-title":"Proceedings 9th International Euro-Par Conference","author":"Bell, R."},{"key":"atypb2","doi-asserted-by":"publisher","DOI":"10.1177\/109434200001400303"},{"key":"atypb3","volume-title":"Proceedings 46th AIAA Aerospace Sciences Meeting and Exhibit, AIAA-2008-477","author":"Camelli, F."},{"key":"atypb4","doi-asserted-by":"publisher","DOI":"10.1145\/1353522.1353531"},{"key":"atypb5","first-page":"2002","volume":"9","author":"Goto, K.","year":"2002","journal-title":"FLAME Working Note"},{"key":"atypb6","volume-title":"Proceedings 44th AIAA Aerospace Sciences Meeting and Exhibit","author":"Gupta, S."},{"key":"atypb7","volume-title":"Proceedings 45th AIAA Aerospace Sciences Meeting and Exhibit, AIAA-2007-1110","author":"Gupta, S."},{"key":"atypb8","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827502410530"},{"key":"atypb9","volume-title":"Computer Architecture: A Quantitative Approach","author":"Hennessy, J.","year":"1996","edition":"2"},{"key":"atypb10","doi-asserted-by":"publisher","DOI":"10.2514\/1.2255"},{"issue":"4","key":"atypb11","doi-asserted-by":"crossref","first-page":"1337","DOI":"10.2514\/1.27020","volume":"44","author":"Huang, L.","year":"2006","journal-title":"J. Aircraft"},{"key":"atypb12","volume-title":"Proceedings 43rd AIAA Aerospace Sciences Meeting and Exhibit, AIAA-2005-1380","author":"LeBeau, R."},{"key":"atypb13","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4615-0361-3_8"},{"key":"atypb14","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2006.09.005"},{"key":"atypb15","doi-asserted-by":"publisher","DOI":"10.1016\/j.camwa.2007.08.001"},{"key":"atypb16","doi-asserted-by":"publisher","DOI":"10.1177\/1094342006085020"},{"key":"atypb17","volume-title":"Proceedings 3rd AIAA Flow Control Conference","author":"Pern, N."},{"key":"atypb18","volume-title":"Linux x86 performance-monitoring counters driver","author":"Pettersson, M.","year":"2006"},{"key":"atypb19","volume-title":"Proceedings 37th Fluid AIAA Fluid Dynamics Conference and Exhibit, AIAA-2007-4100","author":"Reasor, D."},{"key":"atypb20","doi-asserted-by":"publisher","DOI":"10.2514\/3.8284"},{"key":"atypb21","doi-asserted-by":"publisher","DOI":"10.1177\/1094342006085024"},{"key":"atypb22","doi-asserted-by":"publisher","DOI":"10.1115\/1.1580159"},{"issue":"8","key":"atypb23","first-page":"910","volume":"35","author":"Wellein, G.","year":"2005","journal-title":"Comput. Fluids"},{"issue":"131","key":"atypb24","first-page":"97","author":"Whaley, R.","year":"1997","journal-title":"LAPACK Working Note No"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342009358413","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342009358413","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T08:18:54Z","timestamp":1777450734000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342009358413"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,1,11]]},"references-count":24,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2010,8]]}},"alternative-id":["10.1177\/1094342009358413"],"URL":"https:\/\/doi.org\/10.1177\/1094342009358413","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,1,11]]}}}