{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,8]],"date-time":"2025-09-08T06:14:29Z","timestamp":1757312069844,"version":"3.33.0"},"reference-count":36,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2010,10,26]],"date-time":"2010-10-26T00:00:00Z","timestamp":1288051200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2011,5]]},"abstract":"<jats:p> The emergence of multicore architectures and the chip industry\u2019s plan to roll out hundreds of cores per die sometime in the near future might have triggered the evolution of von Neumann architectures towards a parallel processing paradigm. The capability to have hundreds of cores per die is exciting, but how optimally we are able to utilize such a resource remains a challenge. Since there are no straightforward solutions we seek inspiration from relevant scientific processes. Cellular automata which are inherently decentralized and spatially extended structures provide a potential candidate among parallel processing alternatives. The availability of spatial parallelism on field programmable gate arrays make them the ideal platform to investigate cellular automata systems as potential parallel processing paradigms on multicore architectures. This article presents a massively parallel implementation for a floating-point-based cellular automata using special purpose hardware such as Field Programmable Gate Array (FPGAs). The challenge is to best map an application to the underlying many-core architecture and address issues such as inter-core communication, scalability, and flexibility both in terms of hardware and software. Maxwell \u2014 a 64-node FPGA supercomputer, is used for accelerator implementations that range from a single to a multiple FPGA-enabled system. A performance model is proposed and demonstrated to closely reproduce measured execution times. The performance model enables identification of the main sources of overhead and suggests improvements to the architecture and implementation of the lattice Boltzmann method and compute-bound cellular automata in general. Further, a 2 million cell 2DQ9 lattice Boltzmann method lattice with periodic boundary conditions, simulated using a multiple FPGA chip accelerator implementation, is presented. The performance model shows how the FPGA-enabled PC cluster is the preferred multiple FPGA organization over the multiple FPGA-based PC setup. Latency hiding is fully exploited for PC cluster-based system implementations and demonstrated using system profiling. <\/jats:p>","DOI":"10.1177\/1094342010383138","type":"journal-article","created":{"date-parts":[[2010,10,27]],"date-time":"2010-10-27T00:46:03Z","timestamp":1288140363000},"page":"193-204","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":15,"title":["Cellular Automata Simulations on a FPGA cluster"],"prefix":"10.1177","volume":"25","author":[{"given":"S.","family":"Murtaza","sequence":"first","affiliation":[{"name":"University of Amsterdam,"}]},{"given":"A.G.","family":"Hoekstra","sequence":"additional","affiliation":[{"name":"University of Amsterdam"}]},{"given":"P.M.A.","family":"Sloot","sequence":"additional","affiliation":[{"name":"University of Amsterdam"}]}],"member":"179","published-online":{"date-parts":[[2010,10,26]]},"reference":[{"volume-title":"The landscape of parallel computing research: a view from Berkeley. Study Report, Electrical Engineering and Computer Sciences","year":"2006","author":"Asanovic, K.","key":"atypb1"},{"key":"atypb2","doi-asserted-by":"publisher","DOI":"10.1145\/1562764.1562783"},{"key":"atypb3","doi-asserted-by":"crossref","unstructured":"Baxter, R., Booth, S., Bull, M., Cawood, G., Perry, J., Parsons, M., Simpson, A., Trew, A., McCormick, A., Smart, G., Smart, R., Cantle, A., Chamberlain, R. and Genest, G. ( 2007 a). Maxwell - a 64 FPGA Supercomputer. J. Amer. Helic. Soc. IEEE Computer Society, pp. 287-294.","DOI":"10.1109\/AHS.2007.71"},{"volume-title":"AHS \u201807: Second NASA\/ESA Conference on Adaptive Hardware Systems","author":"Baxter, R.","key":"atypb4"},{"key":"atypb5","doi-asserted-by":"publisher","DOI":"10.1109\/40.782564"},{"volume-title":"DAC \u201806: Proceedings of the 43rd annual Design Automation Conference","author":"Borkar, S.","key":"atypb6"},{"volume-title":"DAC \u201807: Proceedings of the 44th annual Design Automation Conference","author":"Borkar, S.","key":"atypb7"},{"journal-title":"10th Summer School on Computing Techniques in Physics: HPC in Science","year":"1994","author":"Bubak, M.","key":"atypb8"},{"volume-title":"Custom reconfigurable computing machine for high performance cellular automata processing","year":"2001","author":"Cappuccino, G.","key":"atypb9"},{"key":"atypb10","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511549755"},{"volume-title":"Cellular Automaton Modeling of Biological Pattern Formation","year":"2004","author":"Deutsch, A.","key":"atypb11"},{"key":"atypb12","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2008.65"},{"key":"atypb13","unstructured":"Gokhale, M., Rickett, C., Tripp, J.L., Hsu, C. and Scrofano, R. ( 2006). Promises and pitfalls of reconfigurable supercomputing . In: ERSA, pp. 11-20. Available online at http:\/\/citeseerx.ist.psu.edu\/viewdoc\/summary?doi=10.1.1.84.1204 ."},{"volume-title":"Cellular Automata","year":"1990","author":"Gutowitz, H.A.","key":"atypb14"},{"journal-title":"White Paper, Intel","year":"2006","author":"Held, J.","key":"atypb15"},{"key":"atypb16","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2007.79"},{"volume-title":"Modelling Complex Systems with Cellular Automata","year":"2010","author":"Hoekstra, A.","key":"atypb17"},{"key":"atypb18","doi-asserted-by":"publisher","DOI":"10.1142\/4702"},{"volume-title":"The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines","author":"Kobori, T.","key":"atypb19"},{"key":"atypb20","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2008.01.008"},{"volume-title":"RSSI \u201807: Proceedings of the Third Annual Reconfigurable Systems Summer Institute","author":"Murtaza, S.","key":"atypb21"},{"volume-title":"FPL \u201807: 17th International Conference on Field Programmable Logic and Applications. IEEE","author":"Murtaza, S.","key":"atypb22"},{"volume-title":"HPRCTA \u201808: Second International Workshop on High-Performance Reconfigurable Computing Technology and Applications","author":"Murtaza, S.","key":"atypb23"},{"key":"atypb24","doi-asserted-by":"publisher","DOI":"10.1145\/1462586.1462592"},{"journal-title":"ACM Trans. Reconfig. Technol. Syst., submitted","year":"2009","author":"Murtaza, S.","key":"atypb25"},{"key":"atypb26","volume":"309","author":"Shand, D.","year":"2005","journal-title":"White Paper"},{"key":"atypb27","doi-asserted-by":"publisher","DOI":"10.1007\/BF00936946"},{"volume-title":"6th International Conference on Cellular Automata for Research and Industry, ACRI 2004","author":"Sloot, P.M.A.","key":"atypb28"},{"key":"atypb29","doi-asserted-by":"crossref","unstructured":"Sloot, P.M.A. and Hoekstra, A.G. ( 2001). Cellular automata as a mesoscopic approach to model and simulate complex systems. In: Lecture Notes in Computer Science . Springer Verlag, Vol-2073\/2001, p. 518.","DOI":"10.1007\/3-540-45545-0_61"},{"volume-title":"Modeling dynamical systems with cellular automata","year":"2007","author":"Sloot, P.M.A.","key":"atypb30"},{"key":"atypb31","unstructured":"Sloot, P.M.A., Kaandorp, J.A., Hoekstra, A.G. and Overeinder, B.J. ( 2001 a). Distributed Cellular Automata: large scale simulation of natural phenomena. In: Solutions to Parallel and Distributed Computing Problems, Lessons from Biological Sciences. Computer Centre University of Tromso. pp. 1-46."},{"key":"atypb32","doi-asserted-by":"publisher","DOI":"10.1016\/S0010-4655(01)00325-3"},{"key":"atypb33","doi-asserted-by":"publisher","DOI":"10.1016\/j.peva.2004.10.004"},{"volume-title":"The Lattice-Boltzmann Equation","year":"2001","author":"Succi, S.","key":"atypb34"},{"key":"atypb35","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/1763.001.0001"},{"key":"atypb36","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2008.4"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342010383138","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342010383138","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,28]],"date-time":"2025-01-28T09:54:15Z","timestamp":1738058055000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342010383138"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,10,26]]},"references-count":36,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2011,5]]}},"alternative-id":["10.1177\/1094342010383138"],"URL":"https:\/\/doi.org\/10.1177\/1094342010383138","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2010,10,26]]}}}