{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:08:07Z","timestamp":1750306087060,"version":"3.41.0"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2017,8,30]],"date-time":"2017-08-30T00:00:00Z","timestamp":1504051200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"\u201cExcellence Initiative\u201d of the German Federal and State Governments and the Graduate School of Computational Engineering at Technische Universit\u00b4t Darmstadt"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2017,9,30]]},"abstract":"<jats:p>Optimal code performance is (besides correctness and accuracy) the most important objective in compute intensive applications. In many of these applications, Graphic Processing Units (GPUs) are used because of their high amount of compute power. However, caused by their massively parallel architecture, the code has to be specifically adjusted to the underlying hardware to achieve optimal performance and therefore has to be reoptimized for each new generation. In reality, this is usually not the case as productive code is normally at least several years old and nobody has the time to continuously adjust existing code to new hardware. In recent years more and more approaches have emerged that automatically tune the performance of applications toward the underlying hardware. In this article, we present the MATOG auto-tuner and its concepts. It abstracts the array memory access in CUDA applications and automatically optimizes the code according to the used GPUs. MATOG only requires few profiling runs to analyze even complex applications, while achieving significant speedups over non-optimized code, independent of the used GPU generation and without the need to manually tune the code.<\/jats:p>","DOI":"10.1145\/3106341","type":"journal-article","created":{"date-parts":[[2017,8,30]],"date-time":"2017-08-30T12:52:18Z","timestamp":1504097538000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["MATOG"],"prefix":"10.1145","volume":"14","author":[{"given":"Nicolas","family":"Weber","sequence":"first","affiliation":[{"name":"Graduate School of Computational Engineering, TU Darmstadt,Rundeturmst, Darmstadt, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael","family":"Goesele","sequence":"additional","affiliation":[{"name":"TU Darmstadt, Rundeturmst, Darmstadt, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2017,8,30]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628092"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1468075.1468121"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807611"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/InPar.2012.6339587"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1693453.1693471"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2004.65"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/37401.37414"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/XSW.2013.7"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-13374-9_4"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1513895.1513902"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ESTIMedia.2014.6962353"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-48096-0_21"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600212.2600704"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807606"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/2738600.2738604"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5160988"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2400682.2400718"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2716282.2716284"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628087"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063402"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-11515-8_10"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.59"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSoC.2015.10"},{"key":"e_1_2_1_25_1","unstructured":"NVIDIA. 2014. NVIDIA's Next Generation CUDA Compute Architecture: Kepler GK110\/210 V1.1. http:\/\/images.nvidia.com\/content\/pdf\/tesla\/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf.  NVIDIA. 2014. NVIDIA's Next Generation CUDA Compute Architecture: Kepler GK110\/210 V1.1. http:\/\/images.nvidia.com\/content\/pdf\/tesla\/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf."},{"key":"e_1_2_1_26_1","unstructured":"NVIDIA. 2016. CUDA Programming Guide v8.0. http:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html.  NVIDIA. 2016. CUDA Programming Guide v8.0. http:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/3019057.3019065"},{"volume-title":"Proc. PMBS.","author":"Pennycock S. J.","key":"e_1_2_1_28_1","unstructured":"S. J. Pennycock , J. D. Sewall , and V. W. Lee . 2016. A metric for performance portability . In Proc. PMBS. S. J. Pennycock, J. D. Sewall, and V. W. Lee. 2016. A metric for performance portability. In Proc. PMBS."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/RT.2006.280219"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-31464-3_63"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/InPar.2012.6339606"},{"key":"e_1_2_1_32_1","volume-title":"GTC2010","author":"Volkov Vasily","year":"2010","unstructured":"Vasily Volkov . 2010 . Better Performance at Lower Occupancy . In GTC2010 . Vasily Volkov. 2010. Better Performance at Lower Occupancy. In GTC2010."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2483954.2483963"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2832087.2832093"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/2855568.2855580"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2916026.2916031"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3018743.3018755"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3106341","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3106341","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:30:17Z","timestamp":1750217417000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3106341"}},"subtitle":["Array Layout Auto-Tuning for CUDA"],"short-title":[],"issued":{"date-parts":[[2017,8,30]]},"references-count":37,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2017,9,30]]}},"alternative-id":["10.1145\/3106341"],"URL":"https:\/\/doi.org\/10.1145\/3106341","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2017,8,30]]},"assertion":[{"value":"2016-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-08-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}