{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T07:02:22Z","timestamp":1761807742486},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2012,4,18]],"date-time":"2012-04-18T00:00:00Z","timestamp":1334707200000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2012,11]]},"DOI":"10.1007\/s11227-012-0764-z","type":"journal-article","created":{"date-parts":[[2012,4,17]],"date-time":"2012-04-17T14:54:27Z","timestamp":1334674467000},"page":"946-966","source":"Crossref","is-referenced-by-count":18,"title":["Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters"],"prefix":"10.1007","volume":"62","author":[{"given":"Hikmet","family":"Dursun","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Manaschai","family":"Kunaseth","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ken-ichi","family":"Nomura","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jacqueline","family":"Chame","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Robert F.","family":"Lucas","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chun","family":"Chen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mary","family":"Hall","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rajiv K.","family":"Kalia","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aiichiro","family":"Nakano","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Priya","family":"Vashishta","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2012,4,18]]},"reference":[{"key":"764_CR1","volume-title":"Proceedings of the 2008 international conference for high performance computing, networking, storage and analysis","author":"KJ Barker","year":"2008","unstructured":"Barker KJ, Davis K, Hoisie A, Kerbyson DJ, Lang M, Pakin S, Sancho JC (2008) Entering the petaflop era: the architecture and performance of Roadrunner. In: Proceedings of the 2008 international conference for high performance computing, networking, storage and analysis, Austin, Texas. IEEE Comput Soc, Los Alamitos"},{"key":"764_CR2","volume-title":"Proceedings of the 2008 international conference for high performance computing, networking, storage and analysis","author":"L Carrington","year":"2008","unstructured":"Carrington L, Komatitsch D, Laurenzano M, Tikir MM, Michea D, Goff NL, Snavely A, Tromp J (2008) High-frequency simulations of global seismic wave propagation using SPECFEM3D_GLOBE on 62K processors. In: Proceedings of the 2008 international conference for high performance computing, networking, storage and analysis, Austin, Texas. IEEE Comput Soc, Los Alamitos"},{"key":"764_CR3","doi-asserted-by":"crossref","first-page":"7692","DOI":"10.1016\/j.jcp.2010.06.024","volume":"229","author":"D Komatitsch","year":"2010","unstructured":"Komatitsch D, Erlebacher G, G\u00f6ddeke D, Mich\u00e9a D (2010) High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster. J Comput Phys 229:7692\u20137714","journal-title":"J Comput Phys"},{"key":"764_CR4","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1016\/j.jcp.2004.03.008","volume":"200","author":"S Zhao","year":"2004","unstructured":"Zhao S, Wei GW (2004) High-order FDTD methods via derivative matching for Maxwell\u2019s equations with material interfaces. J Comput Phys 200:60\u2013103","journal-title":"J Comput Phys"},{"key":"764_CR5","volume-title":"Proceedings of the 2010 international conference for high performance computing, networking, storage and analysis","author":"A Nguyen","year":"2010","unstructured":"Nguyen A, Satish N, Chhugani J, Kim C, Dubey P (2010) 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: Proceedings of the 2010 international conference for high performance computing, networking, storage and analysis, New Orleans, Louisiana. IEEE Comput Soc, Los Alamitos"},{"key":"764_CR6","volume-title":"Proceedings of the 2009 IEEE international computer software and applications conference","author":"G Wellein","year":"2009","unstructured":"Wellein G, Hager G, Zeiser T, Wittmann M, Fehske H (2009) Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization. In: Proceedings of the 2009 IEEE international computer software and applications conference, Seattle, Washington. IEEE Comput Soc, Los Alamitos"},{"key":"764_CR7","doi-asserted-by":"crossref","first-page":"762","DOI":"10.1016\/j.jpdc.2009.04.002","volume":"69","author":"S Williams","year":"2009","unstructured":"Williams S, Carter J, Oliker L, Shalf J, Yelick K (2009) Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms. J Parallel Distrib Comput 69:762\u2013777","journal-title":"J Parallel Distrib Comput"},{"key":"764_CR8","volume-title":"Proceedings of the 2008 international conference for high performance computing, networking, storage and analysis","author":"K Datta","year":"2008","unstructured":"Datta K, Murphy M, Volkov V, Williams S, Carter J, Oliker L, Patterson D, Shalf J, Yelick K (2008) Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 international conference for high performance computing, networking, storage and analysis, Austin, Texas. IEEE Comput Soc, Los Alamitos"},{"key":"764_CR9","volume-title":"Proceedings of the 23rd IEEE international parallel and distributed processing symposium","author":"L Peng","year":"2009","unstructured":"Peng L, Seymour R, Nomura K, Kalia RK, Nakano A, Vashishta P, Loddoch A, Netzband M, Volz WR, Wong CC (2009) High-order stencil computations on multicore clusters. In: Proceedings of the 23rd IEEE international parallel and distributed processing symposium, Rome, Italy. IEEE Comput Soc, Los Alamitos"},{"key":"764_CR10","volume-title":"Proceedings of the 2000 international conference for high performance computing, networking, storage and analysis","author":"G Rivera","year":"2000","unstructured":"Rivera G, Tseng C-W (2000) Tiling optimizations for 3D scientific computations. In: Proceedings of the 2000 international conference for high performance computing, networking, storage and analysis, Dallas, Texas. IEEE Comput Soc, Los Alamitos"},{"key":"764_CR11","volume-title":"Proceedings of the 2005 international conference on supercomputing","author":"M Frigo","year":"2005","unstructured":"Frigo M, Strumpen V (2005) Cache oblivious stencil computations. In: Proceedings of the 2005 international conference on supercomputing, Cambridge, Massachusetts. IEEE Comput Soc, Los Alamitos"},{"key":"764_CR12","volume-title":"Proceedings of the 14th IEEE international parallel and distributed processing symposium","author":"D Wonnacott","year":"2000","unstructured":"Wonnacott D (2000) Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In: Proceedings of the 14th IEEE international parallel and distributed processing symposium, Cancun, Mexico"},{"key":"764_CR13","volume-title":"Proceedings of the 21st IEEE international parallel and distributed processing symposium","author":"L Renganarayana","year":"2007","unstructured":"Renganarayana L, Harthikote-Matha M, Dewri R, Rajopadhye SV (2007) Towards optimal multi-level tiling for stencil computations. In: Proceedings of the 21st IEEE international parallel and distributed processing symposium, Long Beach, California. IEEE Comput Soc, Los Alamitos"},{"key":"764_CR14","volume-title":"Proceedings of the 15th international Euro-Par conference on parallel processing","author":"H Dursun","year":"2009","unstructured":"Dursun H, Nomura K, Peng L, Seymour R, Wang W, Kalia RK, Nakano A, Vashishta P (2009) A\u00a0multilevel parallelization framework for high-order stencil computations. In: Proceedings of the 15th international Euro-Par conference on parallel processing, Delft, The Netherlands. Springer, Berlin"},{"key":"764_CR15","doi-asserted-by":"crossref","first-page":"447","DOI":"10.1002\/mmce.20232","volume":"17","author":"G Shen","year":"2007","unstructured":"Shen G, Cangellaris AC (2007) A new FDTD stencil for reduced numerical anisotropy in the computer modeling of wave phenomena: research articles. Int J RF Microw Comput-Aided Eng 17:447\u2013454","journal-title":"Int J RF Microw Comput-Aided Eng"},{"key":"764_CR16","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1016\/0010-4655(94)90048-5","volume":"83","author":"A Nakano","year":"1994","unstructured":"Nakano A, Vashishta P, Kalia RK (1994) Multiresolution molecular dynamics algorithm for realistic materials modeling on parallel computers. Comput Phys Commun 83:197\u2013214","journal-title":"Comput Phys Commun"},{"key":"764_CR17","volume-title":"Proceedings of the 2007 DoD high performance computing modernization program users group conference","author":"M Parker","year":"2007","unstructured":"Parker M, Ketcham S, Cudney H (2007) Acoustic wave propagation in urban environments. In: Proceedings of the 2007 DoD high performance computing modernization program users group conference, Pittsburgh, Pennsylvania. IEEE Comput Soc, Los Alamitos"},{"key":"764_CR18","volume-title":"3rd workshop on high performance computational finance (WHPCF), in conjunction with the 2010 international conference for high performance computing, networking, storage and analysis","author":"DM Dang","year":"2010","unstructured":"Dang DM, Christara CC, Jackson KR (2010) Pricing multi-asset American options on graphics processing units using a PDE approach. In: 3rd workshop on high performance computational finance (WHPCF), in conjunction with the 2010 international conference for high performance computing, networking, storage and analysis, New Orleans, Louisiana. IEEE Comput Soc, Los Alamitos"},{"key":"764_CR19","unstructured":"PARKBENCH: PARallel kernels and BENCHmarks. Available from http:\/\/www.netlib.org\/parkbench"},{"key":"764_CR20","volume-title":"The NAS parallel benchmarks","author":"D Bailey","year":"1991","unstructured":"Bailey D, Barton J, Laninski T, Simon H (1991) The NAS parallel benchmarks. NASA Ames Research Center, Moffett Field"},{"key":"764_CR21","volume-title":"Proceedings of the ACM SIGPLAN 1991 conference on programming language design and implementation","author":"M Bromley","year":"1991","unstructured":"Bromley M, Heller S, McNerney T, Steele JGL (1991) Fortran at ten gigaflops: the connection machine convolution compiler. In: Proceedings of the ACM SIGPLAN 1991 conference on programming language design and implementation, Toronto, Ontario, Canada. ACM, New York"},{"key":"764_CR22","volume-title":"Proceedings of the 1997 international conference for high performance computing, networking, storage and analysis","author":"G Roth","year":"1997","unstructured":"Roth G, Mellor-Crummey J, Kennedy K, Brickner RG (1997) Compiling stencils in high performance Fortran. In: Proceedings of the 1997 international conference for high performance computing, networking, storage and analysis, San Jose, CA. IEEE Comput Soc, Los Alamitos"},{"key":"764_CR23","volume-title":"Proceedings of the 1996 international conference for high performance computing, networking, storage and analysis","author":"R Bordawekar","year":"1996","unstructured":"Bordawekar R, Choudhary A, Ramanujam J (1996) Automatic optimization of communication in compiling out-of-core stencil codes. In: Proceedings of the 1996 international conference for high performance computing, networking, storage and analysis, Philadelphia, Pennsylvania, United States. IEEE Comput Soc, Los Alamitos"},{"key":"764_CR24","volume-title":"Proceedings of the 2002 conference on Asia south pacific design automation\/VLSI design","author":"J Ramanujam","year":"2002","unstructured":"Ramanujam J, Krishnamurthy S, Hong J, Kandemir M (2002) Address code and arithmetic optimizations for embedded systems. In: Proceedings of the 2002 conference on Asia south pacific design automation\/VLSI design, Bangalore, India. IEEE Comput Soc, Los Alamitos"},{"key":"764_CR25","doi-asserted-by":"crossref","DOI":"10.1103\/PhysRevB.77.085103","volume":"77","author":"F Shimojo","year":"2008","unstructured":"Shimojo F, Kalia RK, Nakano A, Vashishta P (2008) Divide-and-conquer density functional theory on hierarchical real-space grids: parallel implementation and applications. Phys Rev B, Condens Matter Mater Phys 77:085103","journal-title":"Phys Rev B, Condens Matter Mater Phys"},{"key":"764_CR26","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1109\/5992.852388","volume":"2","author":"A Stathopoulos","year":"2000","unstructured":"Stathopoulos A, \u00d6\u011f\u00fct S, Saad Y, Chelikowsky JR, Kim H (2000) Parallel methods and tools for predicting material properties. Comput Sci Eng 2:19\u201332","journal-title":"Comput Sci Eng"},{"key":"764_CR27","volume-title":"MPI: the complete reference: the MPI core","author":"M Snir","year":"1998","unstructured":"Snir M, Otto S (1998) MPI: the complete reference: the MPI core. MIT Press, Cambridge"},{"key":"764_CR28","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1145\/989393.989437","volume":"39","author":"MS Lam","year":"2004","unstructured":"Lam MS, Wolf ME (2004) A data locality optimizing algorithm. ACM SIGPLAN Not 39:442\u2013459","journal-title":"ACM SIGPLAN Not"},{"key":"764_CR29","unstructured":"Chen C, Chame J, Hall M (2008) CHiLL: a\u00a0framework for composing high-level loop transformations. USC computer science technical report"},{"key":"764_CR30","unstructured":"IBM (2008) IBM system Blue Gene Solution: Blue Gene\/P application development"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-012-0764-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s11227-012-0764-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-012-0764-z","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,6,1]],"date-time":"2019-06-01T10:24:06Z","timestamp":1559384646000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s11227-012-0764-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,4,18]]},"references-count":30,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2012,11]]}},"alternative-id":["764"],"URL":"https:\/\/doi.org\/10.1007\/s11227-012-0764-z","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"value":"0920-8542","type":"print"},{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,4,18]]}}}