{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T21:54:56Z","timestamp":1740174896671,"version":"3.37.3"},"reference-count":9,"publisher":"Wiley","license":[{"start":{"date-parts":[[2018,5,28]],"date-time":"2018-05-28T00:00:00Z","timestamp":1527465600000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Qualcomm Canada"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Scientific Programming"],"published-print":{"date-parts":[[2018,5,28]]},"abstract":"<jats:p>We propose and evaluate a novel strategy for tuning the performance of a class of stencil computations on Graphics Processing Units. The strategy uses a machine learning model to predict the optimal way to load data from memory followed by a heuristic that divides other optimizations into groups and exhaustively explores one group at a time. We use a set of 104 synthetic OpenCL stencil benchmarks that are representative of many real stencil computations. We first demonstrate the need for auto-tuning by showing that the optimization space is sufficiently complex that simple approaches to determining a high-performing configuration fail. We then demonstrate the effectiveness of our approach on NVIDIA and AMD GPUs. Relative to a random sampling of the space, we find configurations that are 12%\/32% faster on the NVIDIA\/AMD platform in 71% and 4% less time, respectively. Relative to an expert search, we achieve 5% and 9% better performance on the two platforms in 89% and 76% less time. We also evaluate our strategy for different stencil computational intensities, varying array sizes and shapes, and in combination with expert search.<\/jats:p>","DOI":"10.1155\/2018\/6093054","type":"journal-article","created":{"date-parts":[[2018,5,28]],"date-time":"2018-05-28T19:31:03Z","timestamp":1527535863000},"page":"1-24","source":"Crossref","is-referenced-by-count":4,"title":["A Strategy for Automatic Performance Tuning of Stencil Computations on GPUs"],"prefix":"10.1155","volume":"2018","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7834-8339","authenticated-orcid":true,"given":"Joseph D.","family":"Garvey","sequence":"first","affiliation":[{"name":"Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada M5S 3G4"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2985-4873","authenticated-orcid":true,"given":"Tarek S.","family":"Abdelrahman","sequence":"additional","affiliation":[{"name":"Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada M5S 3G4"}]}],"member":"311","reference":[{"key":"2","doi-asserted-by":"publisher","DOI":"10.1137\/120883153"},{"key":"3","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2014.10.011"},{"issue":"1","key":"6","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","year":"2001","journal-title":"Machine Learning"},{"year":"2012","key":"7"},{"key":"11","doi-asserted-by":"publisher","DOI":"10.1090\/S0002-9947-1954-0059635-7"},{"volume-title":"Artificial Intelligence: A Modern Approach","year":"2003","key":"15"},{"key":"18","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656278"},{"key":"19","doi-asserted-by":"publisher","DOI":"10.1145\/5666.5673"},{"key":"37","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-01970-8_89"}],"container-title":["Scientific Programming"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/sp\/2018\/6093054.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/sp\/2018\/6093054.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/sp\/2018\/6093054.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2018,5,28]],"date-time":"2018-05-28T19:31:09Z","timestamp":1527535869000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.hindawi.com\/journals\/sp\/2018\/6093054\/"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,5,28]]},"references-count":9,"alternative-id":["6093054","6093054"],"URL":"https:\/\/doi.org\/10.1155\/2018\/6093054","relation":{},"ISSN":["1058-9244","1875-919X"],"issn-type":[{"type":"print","value":"1058-9244"},{"type":"electronic","value":"1875-919X"}],"subject":[],"published":{"date-parts":[[2018,5,28]]}}}