{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T05:34:31Z","timestamp":1741066471955,"version":"3.38.0"},"reference-count":31,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2019,1,17]],"date-time":"2019-01-17T00:00:00Z","timestamp":1547683200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"DOI":"10.13039\/100010686","name":"H2020 European Institute of Innovation and Technology","doi-asserted-by":"publisher","award":["689772"],"award-info":[{"award-number":["689772"]}],"id":[{"id":"10.13039\/100010686","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2019,5]]},"abstract":"<jats:p> Many software mechanisms for geophysics exploration in oil and gas industries are based on wave propagation simulation. To perform such simulations, state-of-the-art high-performance computing architectures are employed, generating results faster with more accuracy at each generation. The software must evolve to support the new features of each design to keep performance scaling. Furthermore, it is important to understand the impact of each change applied to the software to improve the performance as most as possible. In this article, we propose several optimization strategies for a wave propagation model for six architectures: Intel Broadwell, Intel Haswell, Intel Knights Landing, Intel Knights Corner, NVIDIA Pascal, and NVIDIA Kepler. We focus on improving the cache memory usage, vectorization, load balancing, portability, and locality in the memory hierarchy. We analyze the hardware impact of the optimizations, providing insights of how each strategy can improve the performance. The results show that NVIDIA Pascal outperforms the other considered architectures by up to 8.5[Formula: see text]. <\/jats:p>","DOI":"10.1177\/1094342018824150","type":"journal-article","created":{"date-parts":[[2019,1,18]],"date-time":"2019-01-18T03:00:58Z","timestamp":1547780458000},"page":"473-486","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":10,"title":["Optimization strategies for geophysics models on manycore systems"],"prefix":"10.1177","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5178-1036","authenticated-orcid":false,"given":"Matheus S","family":"Serpa","sequence":"first","affiliation":[{"name":"Informatics Institute, Federal University of Rio Grande do Sul, Porto Alegre, Brazil"}]},{"given":"Eduardo HM","family":"Cruz","sequence":"additional","affiliation":[{"name":"Federal Institute of Parana, Paranavai, Brazil"}]},{"given":"Matthias","family":"Diener","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana\u2013Champaign, Champaign, IL, USA"}]},{"given":"Arthur M","family":"Krause","sequence":"additional","affiliation":[{"name":"Informatics Institute, Federal University of Rio Grande do Sul, Porto Alegre, Brazil"}]},{"given":"Philippe OA","family":"Navaux","sequence":"additional","affiliation":[{"name":"Informatics Institute, Federal University of Rio Grande do Sul, Porto Alegre, Brazil"}]},{"given":"Jairo","family":"Panetta","sequence":"additional","affiliation":[{"name":"Computer Science Division, ITA, S\u00e3o Jos\u00e9 dos Campos, Brazil"}]},{"given":"Albert","family":"Farr\u00e9s","sequence":"additional","affiliation":[{"name":"Barcelona Supercomputing Center, Barcelona, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7814-0359","authenticated-orcid":false,"given":"Claudia","family":"Rosas","sequence":"additional","affiliation":[{"name":"Barcelona Supercomputing Center, Barcelona, Spain"}]},{"given":"Mauricio","family":"Hanzich","sequence":"additional","affiliation":[{"name":"Barcelona Supercomputing Center, Barcelona, Spain"}]}],"member":"179","published-online":{"date-parts":[[2019,1,17]]},"reference":[{"key":"bibr1-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-12-802118-7.00023-6"},{"key":"bibr2-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063400"},{"key":"bibr3-1094342018824150","doi-asserted-by":"publisher","DOI":"10.3997\/2214-4609.201414035"},{"key":"bibr4-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.4929"},{"key":"bibr5-1094342018824150","unstructured":"Casey SD (2011) How to determine the effectiveness of hyper-threading technology with an application. Available at: https:\/\/software.intel.com\/en-us\/articles\/how-to-determine-the-effectiveness-of-hyper-threading-technology-with-\/an-application\/ (accessed October 2017)."},{"key":"bibr6-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2016.01.011"},{"key":"bibr7-1094342018824150","doi-asserted-by":"crossref","unstructured":"Chrysos G (2012) Intel Xeon Phi X100 Family Coprocessor - the Architecture. Available at: https:\/\/software.intel.com\/en-us\/articles\/intel-xeon-phi-coprocessor-codename-knights-corner (accessed October 2017).","DOI":"10.1109\/HOTCHIPS.2012.7476487"},{"key":"bibr8-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1190\/segam2015-5871173.1"},{"key":"bibr9-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1190\/1.3284053"},{"key":"bibr10-1094342018824150","unstructured":"Corbet J (2012) Toward better NUMA scheduling. Available at: http:\/\/lwn.net\/Articles\/486858\/ (accessed October 2017)."},{"key":"bibr11-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1145\/2975587"},{"key":"bibr12-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1109\/PDP2018.2018.00021"},{"key":"bibr13-1094342018824150","first-page":"1","volume-title":"Linux Kongress","author":"de Melo AC","year":"2010"},{"key":"bibr14-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-74466-5_17"},{"key":"bibr15-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1145\/3006385"},{"key":"bibr16-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-30961-8_2"},{"key":"bibr17-1094342018824150","unstructured":"Intel (2012) Intel performance counter monitor \u2013 a better way to measure CPU utilization. Available at: http:\/\/www.intel.com\/software\/pcm (accessed October 2017)."},{"key":"bibr18-1094342018824150","unstructured":"Intel (2014) OpenMP thread affinity control. Available at: https:\/\/software.intel.com\/en-us\/articles\/openmp-thread-affinity-control (accessed October 2017)."},{"key":"bibr19-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1109\/WOLFHPC.2016.06"},{"key":"bibr20-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1145\/1816038.1816021"},{"key":"bibr21-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1109\/ASPDAC.2015.7059044"},{"key":"bibr22-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1145\/1513895.1513905"},{"issue":"2","key":"bibr23-1094342018824150","volume":"7","author":"Niu X","year":"2014","journal-title":"ACM Transactions on Reconfigurable Technology and Systems"},{"key":"bibr24-1094342018824150","doi-asserted-by":"publisher","DOI":"10.3997\/2214-4609.20130646"},{"key":"bibr25-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2012.6237038"},{"key":"bibr26-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1109\/SBAC-PADW.2017.17"},{"key":"bibr27-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1109\/PDP2018.2018.00058"},{"key":"bibr28-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2010.69"},{"key":"bibr29-1094342018824150","first-page":"63","volume":"25","author":"Tousimojarad A","year":"2014","journal-title":"Parallel Computing: Accelerating Computational Science and Engineering (CSE), Advances in Parallel Computing"},{"key":"bibr30-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1145\/1400097.1400102"},{"key":"bibr31-1094342018824150","doi-asserted-by":"publisher","DOI":"10.1190\/segam2013-0861.1"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342018824150","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342018824150","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342018824150","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T10:27:12Z","timestamp":1740997632000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342018824150"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,17]]},"references-count":31,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2019,5]]}},"alternative-id":["10.1177\/1094342018824150"],"URL":"https:\/\/doi.org\/10.1177\/1094342018824150","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2019,1,17]]}}}