{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T18:24:56Z","timestamp":1771698296157,"version":"3.50.1"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2021,1,20]],"date-time":"2021-01-20T00:00:00Z","timestamp":1611100800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2021,3,31]]},"abstract":"<jats:p>\n            Auto-tuning is a popular approach to program optimization: it automatically finds good configurations of a program\u2019s so-called tuning parameters whose values are crucial for achieving high performance for a particular parallel architecture and characteristics of input\/output data. We present three new contributions of the Auto-Tuning Framework (ATF), which enable a key advantage in\n            <jats:italic>general-purpose auto-tuning<\/jats:italic>\n            : efficiently optimizing programs whose tuning parameters have\n            <jats:italic>interdependencies<\/jats:italic>\n            among them. We make the following contributions to the three main phases of general-purpose auto-tuning: (1) ATF\n            <jats:italic>generates<\/jats:italic>\n            the search space of interdependent tuning parameters with high performance by efficiently exploiting parameter constraints; (2) ATF\n            <jats:italic>stores<\/jats:italic>\n            such search spaces efficiently in memory, based on a novel chain-of-trees search space structure; (3) ATF\n            <jats:italic>explores<\/jats:italic>\n            these search spaces faster, by employing a multi-dimensional search strategy on its chain-of-trees search space representation. Our experiments demonstrate that, compared to the state-of-the-art, general-purpose auto-tuning frameworks, ATF substantially improves generating, storing, and exploring the search space of interdependent tuning parameters, thereby enabling an efficient overall auto-tuning process for important applications from popular domains, including stencil computations, linear algebra routines, quantum chemistry computations, and data mining algorithms.\n          <\/jats:p>","DOI":"10.1145\/3427093","type":"journal-article","created":{"date-parts":[[2021,1,20]],"date-time":"2021-01-20T17:26:38Z","timestamp":1611163598000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":25,"title":["Efficient Auto-Tuning of Parallel Programs with Interdependent Tuning Parameters via Auto-Tuning Framework (ATF)"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0286-0755","authenticated-orcid":false,"given":"Ari","family":"Rasch","sequence":"first","affiliation":[{"name":"University of Muenster, Germany"}]},{"given":"Richard","family":"Schulze","sequence":"additional","affiliation":[{"name":"University of Muenster, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5048-0741","authenticated-orcid":false,"given":"Michel","family":"Steuwer","sequence":"additional","affiliation":[{"name":"University of Edinburgh, United Kingdom"}]},{"given":"Sergei","family":"Gorlatch","sequence":"additional","affiliation":[{"name":"University of Muenster, Germany"}]}],"member":"320","published-online":{"date-parts":[[2021,1,20]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"2016 IEEE International Symposium on Workload Characterization (IISWC\u201916)","author":"Ahmad M.","unstructured":"M. Ahmad and O. Khan . 2016. GPU concurrency choices in graph analytics . In 2016 IEEE International Symposium on Workload Characterization (IISWC\u201916) . 1--10. M. Ahmad and O. Khan. 2016. GPU concurrency choices in graph analytics. In 2016 IEEE International Symposium on Workload Characterization (IISWC\u201916). 1--10."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628092"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/293347.293348"},{"key":"e_1_2_1_4_1","unstructured":"ATF Artifact Implementation. 2020. Retrieved from https:\/\/gitlab.com\/mdh-project\/taco2020-atf.  ATF Artifact Implementation. 2020. Retrieved from https:\/\/gitlab.com\/mdh-project\/taco2020-atf."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2018.2841200"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342013493644"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840311"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2017.38"},{"key":"e_1_2_1_9_1","volume-title":"Controlling a complete hardware synthesis toolchain with LARA aspects. Microprocess. Microsyst. 37, 8, Part C","author":"Cardoso Jo\u0101o M. P.","year":"2013","unstructured":"Jo\u0101o M. P. Cardoso , Tiago Carvalho , Jos\u00e9 G. F. Coutinho , Ricardo Nobre , Razvan Nane , Pedro C. Diniz , Zlatko Petrov , Wayne Luk , and Koen Bertels . 2013. Controlling a complete hardware synthesis toolchain with LARA aspects. Microprocess. Microsyst. 37, 8, Part C ( 2013 ), 1073--1089. DOI:https:\/\/doi.org\/10.1016\/j.micpro.2013.06.001 Special Issue on European Projects in Embedded System Design: EPESD2012. Jo\u0101o M. P. Cardoso, Tiago Carvalho, Jos\u00e9 G. F. Coutinho, Ricardo Nobre, Razvan Nane, Pedro C. Diniz, Zlatko Petrov, Wayne Luk, and Koen Bertels. 2013. Controlling a complete hardware synthesis toolchain with LARA aspects. Microprocess. Microsyst. 37, 8, Part C (2013), 1073--1089. DOI:https:\/\/doi.org\/10.1016\/j.micpro.2013.06.001 Special Issue on European Projects in Embedded System Design: EPESD2012."},{"key":"e_1_2_1_10_1","unstructured":"Cedric Nugteren. 2020. CLTune Issue. Retrieved from https:\/\/github.com\/CNugteren\/CLTune\/blob\/master\/src\/searchers\/annealing.cc#L134 (commit: 2b49667).  Cedric Nugteren. 2020. CLTune Issue. Retrieved from https:\/\/github.com\/CNugteren\/CLTune\/blob\/master\/src\/searchers\/annealing.cc#L134 (commit: 2b49667)."},{"key":"e_1_2_1_12_1","volume-title":"PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In 2011 IEEE International Parallel 8 Distributed Processing Symposium","author":"Christen Matthias","year":"2011","unstructured":"Matthias Christen , Olaf Schenk , and Helmar Burkhart . 2011 . PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In 2011 IEEE International Parallel 8 Distributed Processing Symposium . IEEE , 676--687. Matthias Christen, Olaf Schenk, and Helmar Burkhart. 2011. PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In 2011 IEEE International Parallel 8 Distributed Processing Symposium. IEEE, 676--687."},{"key":"e_1_2_1_13_1","first-page":"1806","volume-title":"A model-driven approach for a new generation of adaptive libraries. CoRR abs\/1806.07060","author":"Cianfriglia Marco","year":"2018","unstructured":"Marco Cianfriglia , Flavio Vella , Cedric Nugteren , Anton Lokhmotov , and Grigori Fursin . 2018. A model-driven approach for a new generation of adaptive libraries. CoRR abs\/1806.07060 ( 2018 ), 14 pp. arxiv: 1806 .07060 http:\/\/arxiv.org\/abs\/1806.07060. Marco Cianfriglia, Flavio Vella, Cedric Nugteren, Anton Lokhmotov, and Grigori Fursin. 2018. A model-driven approach for a new generation of adaptive libraries. CoRR abs\/1806.07060 (2018), 14 pp. arxiv:1806.07060 http:\/\/arxiv.org\/abs\/1806.07060."},{"key":"e_1_2_1_14_1","first-page":"33","article-title":"An introduction to coupled cluster theory for computational chemists. Revi","volume":"14","author":"Daniel Crawford T.","year":"2000","unstructured":"T. Daniel Crawford and Henry F. Schaefer . 2000 . An introduction to coupled cluster theory for computational chemists. Revi . Comput. Chem. 14 (2000), 33 -- 136 . T. Daniel Crawford and Henry F. Schaefer. 2000. An introduction to coupled cluster theory for computational chemists. Revi. Comput. Chem. 14 (2000), 33--136.","journal-title":"Comput. Chem."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242531.1242553"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840301"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-010-0161-2"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3168824"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5161004"},{"key":"e_1_2_1_20_1","volume-title":"Das Krebsregister-Manual der Gesellschaft der epidemiologischen Krebsregister in Deutschland e","author":"\u00a0al K. Hentschel","unstructured":"K. Hentschel et \u00a0al . 2008. Das Krebsregister-Manual der Gesellschaft der epidemiologischen Krebsregister in Deutschland e . V. Zuckschwerdt Verlag . K. Hentschel et\u00a0al. 2008. Das Krebsregister-Manual der Gesellschaft der epidemiologischen Krebsregister in Deutschland e.V. Zuckschwerdt Verlag."},{"key":"e_1_2_1_21_1","unstructured":"Intel. 2020. Math Kernel Library. Retrieved from https:\/\/software.intel.com\/en-us\/mkl.  Intel. 2020. Math Kernel Library. Retrieved from https:\/\/software.intel.com\/en-us\/mkl."},{"key":"e_1_2_1_22_1","unstructured":"Intel. 2020. Math Kernel Library for Deep Learning Networks. Retrieved from https:\/\/software.intel.com\/en-us\/articles\/intel-mkl-dnn-part-1-library-overview-and-installation.  Intel. 2020. Math Kernel Library for Deep Learning Networks. Retrieved from https:\/\/software.intel.com\/en-us\/articles\/intel-mkl-dnn-part-1-library-overview-and-installation."},{"key":"e_1_2_1_23_1","unstructured":"ISO\/IEC. 2017. ISO international standard ISO\/IEC 14882:2017\u2014Programming language C++.  ISO\/IEC. 2017. ISO international standard ISO\/IEC 14882:2017\u2014Programming language C++."},{"key":"e_1_2_1_24_1","volume-title":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS\u201915)","author":"Jan\u00dfen B.","unstructured":"B. Jan\u00dfen , F. Schwiegelshohn , M. Koedam , F. Duhem , L. Masing , S. Werner , C. Huriaux , A. Courtay , E. Wheatley , K. Goossens , F. Lemonnier , P. Millet , J. Becker , O. Sentieys , and M. H\u00fcbner . 2015. Designing applications for heterogeneous many-core architectures with the FlexTiles Platform . In 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS\u201915) . 254--261. B. Jan\u00dfen, F. Schwiegelshohn, M. Koedam, F. Duhem, L. Masing, S. Werner, C. Huriaux, A. Courtay, E. Wheatley, K. Goossens, F. Lemonnier, P. Millet, J. Becker, O. Sentieys, and M. H\u00fcbner. 2015. Designing applications for heterogeneous many-core architectures with the FlexTiles Platform. In 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS\u201915). 254--261."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_2_1_26_1","volume-title":"2016 International Conference on Parallel Architecture and Compilation Techniques (PACT\u201916)","author":"Jia Z.","unstructured":"Z. Jia , C. Xue , G. Chen , J. Zhan , L. Zhang , Y. Lin , and P. Hofstee . 2016. Auto-tuning Spark big data workloads on POWER8: Prediction-based dynamic SMT threading . In 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT\u201916) . 387--400. Z. Jia, C. Xue, G. Chen, J. Zhan, L. Zhang, Y. Lin, and P. Hofstee. 2016. Auto-tuning Spark big data workloads on POWER8: Prediction-based dynamic SMT threading. In 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT\u201916). 387--400."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2019.00015"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2011.2178620"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 2019 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201919)","author":"Kim Jinsung","unstructured":"Jinsung Kim , Aravind Sukumaran-Rajam , Vineeth Thumma , Sriram Krishnamoorthy , Ajay Panyala , Louis-No\u00ebl Pouchet , Atanas Rountev , and P. Sadayappan . 2019. A code generator for high-performance tensor contractions on GPUs . In Proceedings of the 2019 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201919) . IEEE Press, Piscataway, NJ, 85--95. http:\/\/dl.acm.org\/citation.cfm?id=3314872.3314885. Jinsung Kim, Aravind Sukumaran-Rajam, Vineeth Thumma, Sriram Krishnamoorthy, Ajay Panyala, Louis-No\u00ebl Pouchet, Atanas Rountev, and P. Sadayappan. 2019. A code generator for high-performance tensor contractions on GPUs. In Proceedings of the 2019 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201919). IEEE Press, Piscataway, NJ, 85--95. http:\/\/dl.acm.org\/citation.cfm?id=3314872.3314885."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219837"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3331553.3342613"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/996841.996863"},{"key":"e_1_2_1_33_1","volume-title":"Bound the peak performance of SGEMM on GPU with software-controlled fast memory. [Research Report] RR-7923","author":"Lai Junjie","year":"2012","unstructured":"Junjie Lai and Andr\u00e9 Seznec . 2012. Bound the peak performance of SGEMM on GPU with software-controlled fast memory. [Research Report] RR-7923 , 2012 . hal-00686006v1. Junjie Lai and Andr\u00e9 Seznec. 2012. Bound the peak performance of SGEMM on GPU with software-controlled fast memory. [Research Report] RR-7923, 2012. hal-00686006v1."},{"key":"e_1_2_1_34_1","first-page":"1904","volume-title":"Cross-platform performance portability using highly parametrized SYCL kernels. CoRR abs\/1904.05347","author":"Lawson John","year":"2019","unstructured":"John Lawson , Mehdi Goli , Duncan McBain , Daniel Soutar , and Louis Sugy . 2019. Cross-platform performance portability using highly parametrized SYCL kernels. CoRR abs\/1904.05347 ( 2019 ), 11 pp. arxiv: 1904 .05347 http:\/\/arxiv.org\/abs\/1904.05347 John Lawson, Mehdi Goli, Duncan McBain, Daniel Soutar, and Louis Sugy. 2019. Cross-platform performance portability using highly parametrized SYCL kernels. CoRR abs\/1904.05347 (2019), 11 pp. arxiv:1904.05347 http:\/\/arxiv.org\/abs\/1904.05347"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2458523.2458530"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2694344.2694364"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.59"},{"key":"e_1_2_1_38_1","volume-title":"2015 44th International Conference on Parallel Processing. 969--978","author":"Nelson T.","unstructured":"T. Nelson , A. Rivera , P. Balaprakash , M. Hall , P. D. Hovland , E. Jessup , and B. Norris . 2015. Generating efficient tensor contractions for GPUs . In 2015 44th International Conference on Parallel Processing. 969--978 . T. Nelson, A. Rivera, P. Balaprakash, M. Hall, P. D. Hovland, E. Jessup, and B. Norris. 2015. Generating efficient tensor contractions for GPUs. In 2015 44th International Conference on Parallel Processing. 969--978."},{"key":"e_1_2_1_39_1","unstructured":"Gustavo Niemeyer. 2018. Python-constraint. Retrieved from https:\/\/pypi.org\/project\/python-constraint\/.  Gustavo Niemeyer. 2018. Python-constraint. Retrieved from https:\/\/pypi.org\/project\/python-constraint\/."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3204919.3204924"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSoC.2015.10"},{"key":"e_1_2_1_42_1","unstructured":"NVIDIA. 2020. cuBLAS library. Retrieved from https:\/\/developer.nvidia.com\/cublas.  NVIDIA. 2020. cuBLAS library. Retrieved from https:\/\/developer.nvidia.com\/cublas."},{"key":"e_1_2_1_43_1","unstructured":"NVIDIA. 2020. CUDA C++ Best Practices Guide. Retrieved from https:\/\/docs.nvidia.com\/cuda\/cuda-c-best-practices-guide\/index.html.  NVIDIA. 2020. CUDA C++ Best Practices Guide. Retrieved from https:\/\/docs.nvidia.com\/cuda\/cuda-c-best-practices-guide\/index.html."},{"key":"e_1_2_1_44_1","unstructured":"NVIDIA. 2020. CUDA\u00aeDeep Neural Network library. Retrieved from https:\/\/developer.nvidia.com\/cudnn.  NVIDIA. 2020. CUDA\u00aeDeep Neural Network library. Retrieved from https:\/\/developer.nvidia.com\/cudnn."},{"key":"e_1_2_1_45_1","unstructured":"OpenTuner. 2018. Interdependent Tuning Parameters (Issue 106). Retrieved from https:\/\/github.com\/jansel\/opentuner\/issues\/106.  OpenTuner. 2018. Interdependent Tuning Parameters (Issue 106). Retrieved from https:\/\/github.com\/jansel\/opentuner\/issues\/106."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3330345.3330377"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840306"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-017-0508-z"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.4423"},{"key":"e_1_2_1_50_1","volume-title":"ATF: A generic auto-tuning framework. In 2017 IEEE 19th International Conference on High Performance Computing and Communications","author":"Rasch A.","year":"2017","unstructured":"A. Rasch , M. Haidl , and S. Gorlatch . 2017 . ATF: A generic auto-tuning framework. In 2017 IEEE 19th International Conference on High Performance Computing and Communications ; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC\/SmartCity\/DSS) . 64--71. DOI:https:\/\/doi.org\/10.1109\/HPCC-SmartCity-DSS.2017.9 A. Rasch, M. Haidl, and S. Gorlatch. 2017. ATF: A generic auto-tuning framework. In 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC\/SmartCity\/DSS). 64--71. DOI:https:\/\/doi.org\/10.1109\/HPCC-SmartCity-DSS.2017.9"},{"key":"e_1_2_1_51_1","volume-title":"28th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201919)","author":"Rasch A.","unstructured":"A. Rasch , R. Schulze , and S. Gorlatch . 2019. Generating portable high-performance code via multi-dimensional homomorphisms . In 28th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201919) . 354--369. A. Rasch, R. Schulze, and S. Gorlatch. 2019. Generating portable high-performance code via multi-dimensional homomorphisms. In 28th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201919). 354--369."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297280.3297330"},{"key":"e_1_2_1_53_1","volume-title":"12th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2019)","author":"Rovder Simon","year":"2019","unstructured":"Simon Rovder , Jos\u00e9 Cano , and Michael O\u2019Boyle . 2019 . Optimising convolutional neural networks inference on low-powered GPUs . In 12th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2019) . 14 pp. Simon Rovder, Jos\u00e9 Cano, and Michael O\u2019Boyle. 2019. Optimising convolutional neural networks inference on low-powered GPUs. In 12th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2019). 14 pp."},{"key":"e_1_2_1_54_1","volume-title":"2009 IEEE International Symposium on Parallel Distributed Processing. 1--12","author":"Schaa D.","unstructured":"D. Schaa and D. Kaeli . 2009. Exploring the multiple-GPU design space . In 2009 IEEE International Symposium on Parallel Distributed Processing. 1--12 . D. Schaa and D. Kaeli. 2009. Exploring the multiple-GPU design space. In 2009 IEEE International Symposium on Parallel Distributed Processing. 1--12."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126945"},{"key":"e_1_2_1_56_1","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918)","author":"Sriraman Akshitha","unstructured":"Akshitha Sriraman and Thomas F. Wenisch . 2018. \u00b5Tune: Auto-tuned threading for OLDI microservices . In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918) . USENIX Association, Carlsbad, CA, 177--194. https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/sriraman. Akshitha Sriraman and Thomas F. Wenisch. 2018. \u00b5Tune: Auto-tuned threading for OLDI microservices. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918). USENIX Association, Carlsbad, CA, 177--194. https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/sriraman."},{"key":"e_1_2_1_57_1","volume-title":"Euro-Par\u201997 Parallel Processing","author":"Stenstr\u00f6m Per","unstructured":"Per Stenstr\u00f6m and Jonas Skeppstedt . 1997. A performance tuning approach for shared-memory multiprocessors . In Euro-Par\u201997 Parallel Processing , Christian Lengauer, Martin Griebl, and Sergei Gorlatch (Eds.). Springer , Berlin , 72--83. Per Stenstr\u00f6m and Jonas Skeppstedt. 1997. A performance tuning approach for shared-memory multiprocessors. In Euro-Par\u201997 Parallel Processing, Christian Lengauer, Martin Griebl, and Sergei Gorlatch (Eds.). Springer, Berlin, 72--83."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368858"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3330345.3331059"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.14"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342019865606"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661203"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126939"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3315508.3329973"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2009.07.001"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2018.08.004"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355606"},{"key":"e_1_2_1_68_1","volume-title":"2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916)","author":"Vijaykumar N.","unstructured":"N. Vijaykumar , K. Hsieh , G. Pekhimenko , S. Khan , A. Shrestha , S. Ghose , A. Jog , P. B. Gibbons , and O. Mutlu . 2016. Zorua: A holistic approach to resource virtualization in GPUs . In 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916) . 1--14. N. Vijaykumar, K. Hsieh, G. Pekhimenko, S. Khan, A. Shrestha, S. Ghose, A. Jog, P. B. Gibbons, and O. Mutlu. 2016. Zorua: A holistic approach to resource virtualization in GPUs. In 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916). 1--14."},{"key":"e_1_2_1_69_1","volume-title":"SC\u201998: Proceedings of the 1998 ACM\/IEEE Conference on Supercomputing. IEEE, 38","author":"Clinton Whaley R.","unstructured":"R. Clinton Whaley and Jack J. Dongarra . 1998. Automatically tuned linear algebra software . In SC\u201998: Proceedings of the 1998 ACM\/IEEE Conference on Supercomputing. IEEE, 38 . R. Clinton Whaley and Jack J. Dongarra. 1998. Automatically tuned linear algebra software. In SC\u201998: Proceedings of the 1998 ACM\/IEEE Conference on Supercomputing. IEEE, 38."},{"key":"e_1_2_1_70_1","first-page":"67","article-title":"Numerical optimization","volume":"35","author":"Wright Stephen","year":"1999","unstructured":"Stephen Wright and Jorge Nocedal . 1999 . Numerical optimization . Springer Sci. 35 , 67 \u2013 68 (1999), 7. Stephen Wright and Jorge Nocedal. 1999. Numerical optimization. Springer Sci. 35, 67\u201368 (1999), 7.","journal-title":"Springer Sci."},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3243176.3243187"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3427093","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3427093","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:02:24Z","timestamp":1750197744000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3427093"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,20]]},"references-count":70,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,3,31]]}},"alternative-id":["10.1145\/3427093"],"URL":"https:\/\/doi.org\/10.1145\/3427093","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,20]]},"assertion":[{"value":"2020-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-01-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}