{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T07:36:55Z","timestamp":1768030615021,"version":"3.49.0"},"reference-count":25,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2016,9,18]],"date-time":"2016-09-18T00:00:00Z","timestamp":1474156800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2018,7]]},"abstract":"<jats:p> In this paper we demonstrate techniques for increasing the node-level parallelism of a deterministic discrete ordinates neutral particle transport algorithm on a structured mesh to exploit many-core technologies. Transport calculations form a large part of the computational workload of physical simulations and so good performance is vital for the simulations to complete in reasonable time. We will demonstrate our approach utilizing the SNAP mini-app, which gives a simplified implementation of the full transport algorithm but remains similar enough to the real algorithm to act as a useful proxy for research purposes. <\/jats:p><jats:p> We present an OpenCL implementation of our improved algorithm which achieves a speedup of up to 2.5 \u00d7 on a many-core GPGPU device compared to a state-of-the-art multi-core node for the transport sweep, and up to 4 \u00d7 compared to the multi-core CPUs in the largest GPU enabled supercomputer; the first time this scale of speedup has been achieved for algorithms of this class. We then discuss ways to express our scheme in OpenMP 4.0 and demonstrate the performance on an Intel Knights Corner Xeon Phi compared to the original scheme. <\/jats:p>","DOI":"10.1177\/1094342016668978","type":"journal-article","created":{"date-parts":[[2016,9,20]],"date-time":"2016-09-20T00:20:31Z","timestamp":1474330831000},"page":"555-569","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":6,"title":["An improved parallelism scheme for deterministic discrete ordinates transport"],"prefix":"10.1177","volume":"32","author":[{"given":"Tom","family":"Deakin","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Bristol, Bristol, UK"}]},{"given":"Simon","family":"McIntosh-Smith","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Bristol, Bristol, UK"}]},{"given":"Matt","family":"Martineau","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Bristol, Bristol, UK"}]},{"given":"Wayne","family":"Gaudin","sequence":"additional","affiliation":[{"name":"High Performance Computing, UK Atomic Weapons Establishment, Aldermaston, UK"}]}],"member":"179","published-online":{"date-parts":[[2016,9,18]]},"reference":[{"key":"bibr1-1094342016668978","first-page":"2535","volume-title":"International conference on mathematics, computational methods & reactor physics","author":"Adams MP","year":"2013"},{"key":"bibr2-1094342016668978","unstructured":"AMD. OpenCL optimization case study\u2014simple reductions. Available at: www.developer.amd.com\/resources\/documentation-articles\/articles-whitepapers\/opencl-optimization-case-study-simple-reductions\/(accessed 9 August 2016)."},{"key":"bibr3-1094342016668978","unstructured":"Bailey D, Barszcz E, Barton J, (1994) The NAS parallel benchmarks. Technical report, NASA, RNR-94-007."},{"key":"bibr4-1094342016668978","first-page":"1","volume-title":"International conference on mathematics, computational methods, and reactor physics","author":"Bailey TS","year":"2009"},{"key":"bibr5-1094342016668978","volume-title":"Joint international conference on mathematics and computation (M&C), supercomputing in nuclear applications (SNA) and the Monte Carlo (MC) method, ANS MC2015","author":"Baker RS","year":"2015"},{"key":"bibr6-1094342016668978","volume-title":"Supercomputing, International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Deakin T","year":"2015"},{"key":"bibr7-1094342016668978","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-41321-1_22"},{"key":"bibr8-1094342016668978","volume-title":"Joint international conference on mathematics and computation (M&C), supercomputing in nuclear applications (SNA) and the Monte Carlo (MC) method, ANS MC2015","author":"Evans TM","year":"2015"},{"key":"bibr9-1094342016668978","unstructured":"Heterogeneous System Architecture Foundation. HSA specification library. Available at: www.hsafoundation.com (accessed 9 August 2016)."},{"key":"bibr10-1094342016668978","first-page":"477","volume":"107","author":"Hawkins WD","year":"2012","journal-title":"Transactions of the American Nuclear Society"},{"key":"bibr11-1094342016668978","doi-asserted-by":"publisher","DOI":"10.1177\/109434200001400405"},{"key":"bibr12-1094342016668978","first-page":"198","volume":"65","author":"Koch KR","year":"1992","journal-title":"Transactions of the American Nuclear Society"},{"key":"bibr13-1094342016668978","volume-title":"Computational Methods of Neutron Transport","author":"Lewis EE","year":"1993"},{"key":"bibr14-1094342016668978","first-page":"19","author":"McCalpin JD","year":"1995","journal-title":"IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter"},{"key":"bibr15-1094342016668978","unstructured":"Message passing interface forum. (2012) MPI: A Message-Passing Interface Standard Version 3.0."},{"key":"bibr16-1094342016668978","doi-asserted-by":"publisher","DOI":"10.1016\/j.anucene.2014.08.034"},{"key":"bibr17-1094342016668978","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2008.4536243"},{"key":"bibr18-1094342016668978","unstructured":"Munshi A (2011) The OpenCL Specification, Version 1.1."},{"key":"bibr19-1094342016668978","unstructured":"OpenMP architecture review board (2011) OpenMP Application Program Interface, Version 3.1."},{"key":"bibr20-1094342016668978","author":"Pennycook SJ","year":"2010","journal-title":"Experiences with porting and modelling wavefront algorithms on many-core architectures"},{"key":"bibr21-1094342016668978","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/bxr073"},{"key":"bibr22-1094342016668978","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.73"},{"key":"bibr23-1094342016668978","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"bibr24-1094342016668978","volume-title":"Proceedings of the 2010 IEEE international symposium on parallel and distributed processing, IPDPS","author":"Xiao S","year":"2010"},{"key":"bibr25-1094342016668978","unstructured":"Zerr RJ, Baker RS (2013) SNAP: SN (discrete ordinates) application proxy\u2014proxy description. Technical Report, LA-UR-13-21070, Los Alamos National Labratory."}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342016668978","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342016668978","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342016668978","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,2]],"date-time":"2025-03-02T19:31:59Z","timestamp":1740943919000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342016668978"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,9,18]]},"references-count":25,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2018,7]]}},"alternative-id":["10.1177\/1094342016668978"],"URL":"https:\/\/doi.org\/10.1177\/1094342016668978","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,9,18]]}}}