{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,2]],"date-time":"2025-03-02T05:33:26Z","timestamp":1740893606018,"version":"3.38.0"},"reference-count":35,"publisher":"SAGE Publications","issue":"6","license":[{"start":{"date-parts":[[2019,8,4]],"date-time":"2019-08-04T00:00:00Z","timestamp":1564876800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"DOI":"10.13039\/100005302","name":"University of Illinois at Urbana-Champaign","doi-asserted-by":"publisher","award":["1533912"],"award-info":[{"award-number":["1533912"]}],"id":[{"id":"10.13039\/100005302","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2019,11]]},"abstract":"<jats:p> Code optimization is an intricate task that is getting more complex as computing systems evolve. Managing the program optimization process, including the implementation and evaluation of code variants, is tedious, inefficient, and errors are likely to be introduced in the process. Moreover, because each platform typically requires a different sequence of transformations to fully harness its computing power, the optimization process complexity grows as new platforms are adopted. To address these issues, systems and frameworks have been proposed to automate the code optimization process. They, however, have not been widely adopted and are primarily used by experts with deep knowledge about underlying architecture and compiler intricacies. This article describes the requirements that we believe necessary for making automatic performance tuning more broadly used, especially in complex, long-lived high-performance computing applications. Besides discussing limitations of current systems and strategies to overcome these, we describe the design of a system that is able to semi-automatically generate efficient platform-specific code. In the proposed system, the code optimization is programmer-guided, separately from application code, on an external file in what we call optimization programming. The language to program the optimization process is able to represent complex collections of transformations and, as a result, generate efficient platform-specific code. A database manages different optimized versions of code regions, providing a pragmatic approach to performance portability, and the framework itself has separate components, allowing the optimized code to be used on systems without installing all of the modules required for the code generation. We present experiments on two different platforms to illustrate the generation of efficient platform-specific code that performs comparable to hand-optimized, vendor-provided code. <\/jats:p>","DOI":"10.1177\/1094342019865606","type":"journal-article","created":{"date-parts":[[2019,8,5]],"date-time":"2019-08-05T02:52:30Z","timestamp":1564973550000},"page":"1290-1306","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":2,"title":["Managing code transformations for better performance portability"],"prefix":"10.1177","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8031-0652","authenticated-orcid":false,"given":"Thiago SFX","family":"Teixeira","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA"}]},{"given":"William","family":"Gropp","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA"}]},{"given":"David","family":"Padua","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA"}]}],"member":"179","published-online":{"date-parts":[[2019,8,4]]},"reference":[{"key":"bibr1-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1145\/1542476.1542481"},{"key":"bibr2-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628092"},{"key":"bibr3-1094342019865606","unstructured":"Balay S, Abhyankar S, Adams MF, et al. (2018) PETSc web page. Available at: http:\/\/www.mcs.anl.gov\/petsc (accessed 24 July 2019)."},{"key":"bibr5-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4612-1986-6_8"},{"key":"bibr6-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1109\/MS.2008.103"},{"key":"bibr7-1094342019865606","unstructured":"Bergstra J, Yamins D, Cox DD (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICML\u201913. JMLR.org, Atlanta, GA, USA, 16\u201321 June 2013, pp. I\u2013115\u2013I\u2013123. Available at: http:\/\/dl.acm.org\/citation.cfm?id=3042817.3042832"},{"key":"bibr8-1094342019865606","doi-asserted-by":"crossref","unstructured":"Bilmes J, Asanovic K, Chin CW, et al. (1997) Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: Proceedings of the 11th International Conference on Supercomputing, ICS \u201897, Vienna, Austria, 7\u201311 July 1997, pp. 340\u2013347. New York, NY, USA: ACM. ISBN 0-89791-902-5, DOI:10.1145\/263580.263662. Available at: http:\/\/doi.acm.org\/10.1145\/263580.263662","DOI":"10.1145\/263580.263662"},{"key":"bibr9-1094342019865606","doi-asserted-by":"crossref","unstructured":"Bondhugula U, Hartono A, Ramanujam J, et al. (2008) A practical automatic polyhedral parallelizer and locality optimizer. In: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI \u201808, Tucson, AZ, USA, 7\u201313 June 2008, pp. 101\u2013113. New York, NY, USA: ACM. ISBN 978-1-59593-860-2, DOI:10.1145\/1375581.1375595. Available at: http:\/\/doi.acm.org\/10.1145\/1375581.1375595","DOI":"10.1145\/1375581.1375595"},{"volume-title":"CHiLL: A framework for composing high-level loop transformations","year":"2008","author":"Chen C","key":"bibr10-1094342019865606"},{"key":"bibr11-1094342019865606","first-page":"30","volume-title":"Proceedings of the 2004 ACM\/IEEE Conference on Supercomputing, SC \u201804","author":"Chung IH","year":"2004"},{"key":"bibr12-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1109\/99.660313"},{"key":"bibr13-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1137\/070693199"},{"key":"bibr14-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-69330-7_10"},{"key":"bibr15-1094342019865606","doi-asserted-by":"crossref","unstructured":"Edwards HC, Trott CR, Sunderland D (2014) Kokkos: enabling manycore performance portability through polymorphic memory access patterns. Journal of Parallel and Distributed Computing 74(12): 3202\u20133216. Available at: http:\/\/www.sciencedirect.com\/science\/article\/pii\/S0743731514001257 (accessed 24 July 2019). Domain-Specific Languages and High-Level Frameworks for High-Performance Computing.","DOI":"10.1016\/j.jpdc.2014.07.003"},{"key":"bibr16-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2006.55"},{"key":"bibr17-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1145\/301618.301661"},{"key":"bibr18-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5161004"},{"key":"bibr19-1094342019865606","doi-asserted-by":"publisher","DOI":"10.2172\/1169830"},{"volume-title":"Pips: A workbench for building interprocedural parallelizers, compilers and optimizers","year":"1996","author":"Keryell R","key":"bibr20-1094342019865606"},{"key":"bibr21-1094342019865606","unstructured":"Li X, Garzar\u00e1n MJ, Padua D (2004) A dynamically tuned sorting library. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, CGO \u201804, pp. 111. Washington, DC, USA: IEEE Computer Society. ISBN 0-7695-2102-9, Available at: http:\/\/dl.acm.org\/citation.cfm?id=977395.977663 (accessed 24 July 2019)."},{"key":"bibr22-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1109\/DSNW.2012.6264672"},{"key":"bibr23-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.59"},{"key":"bibr24-1094342019865606","unstructured":"Pennycook SJ, Sewall JD, Lee VW (2016) A metric for performance portability. arXiv e-prints. Available at: https:\/\/arxiv.org\/abs\/1611.07409 (accessed 24 July 2019)."},{"issue":"2","key":"bibr25-1094342019865606","first-page":"232","volume":"93","author":"P\u00fcschel M","year":"2005","journal-title":"Proceedings of the IEEE, special issue on \u201cProgram Generation, Optimization, and Adaptation"},{"key":"bibr26-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1145\/3150211"},{"key":"bibr27-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1145\/2858949.2784754"},{"key":"bibr28-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1145\/1989493.1989508"},{"key":"bibr29-1094342019865606","unstructured":"Tange O (2011) GNU parallel - the command-line power tool. login: The USENIX Magazine 36(1): 42\u201347. Available at: http:\/\/www.gnu.org\/s\/parallel (accessed 24 July 2019)."},{"key":"bibr30-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661203"},{"key":"bibr31-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1177\/1094342011414744"},{"key":"bibr32-1094342019865606","doi-asserted-by":"crossref","unstructured":"Vuduc R, Demmel JW, Yelick KA (2005) OSKI: a library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series 16(1): 521. Availble at: http:\/\/stacks.iop.org\/1742-6596\/16\/i=1\/a=071","DOI":"10.1088\/1742-6596\/16\/1\/071"},{"key":"bibr33-1094342019865606","unstructured":"Whaley RC, Dongarra JJ (1998) Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM\/IEEE Conference on Supercomputing, SC \u201898, pp. 1\u201327. Washington, DC, USA: IEEE Computer Society. ISBN 0-89791-984-X. Available at: http:\/\/dl.acm.org\/citation.cfm?id=509058.509096 (accessed 24 July 2019)."},{"key":"bibr34-1094342019865606","unstructured":"Wolfe M (2016) Compilers and more: what makes performance portable? Available at: https:\/\/www.hpcwire.com\/2016\/04\/19\/compilers-makes-performance-portabl (accessed 19 April 2016)."},{"key":"bibr35-1094342019865606","unstructured":"XPACC (2018) Center for the exascale simulation of plasma-coupled combustion web page. Available at: http:\/\/xpacc.illinois.edu; http:\/\/xpacc.illinois.edu (accessed 24 July 2019)."},{"key":"bibr36-1094342019865606","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2007.370637"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342019865606","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342019865606","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342019865606","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T08:35:53Z","timestamp":1740818153000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342019865606"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,8,4]]},"references-count":35,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2019,11]]}},"alternative-id":["10.1177\/1094342019865606"],"URL":"https:\/\/doi.org\/10.1177\/1094342019865606","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2019,8,4]]}}}