{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T07:14:25Z","timestamp":1763968465231,"version":"3.40.3"},"publisher-location":"Cham","reference-count":28,"publisher":"Springer International Publishing","isbn-type":[{"type":"print","value":"9783030839772"},{"type":"electronic","value":"9783030839789"}],"license":[{"start":{"date-parts":[[2021,1,1]],"date-time":"2021-01-01T00:00:00Z","timestamp":1609459200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,8,23]],"date-time":"2021-08-23T00:00:00Z","timestamp":1629676800000},"content-version":"vor","delay-in-days":234,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Functional languages allow rewrite-rule systems that aggressively generate a multitude of semantically-equivalent but differently-optimized code versions. In the context of GPGPU execution, this paper addresses the important question of how to compose these code versions into a single program that (near-)optimally discriminates them across different datasets. Rather than aiming at a general autotuning framework reliant on stochastic search, we argue that in some cases, a more effective solution can be obtained by customizing the tuning strategy for the compiler transformation producing the code versions.<\/jats:p><jats:p>We present a simple and highly-composable strategy which requires that the (dynamic) program property used to discriminate between code versions conforms with a certain <jats:italic>monotonicity assumption<\/jats:italic>. Assuming the monotonicity assumption holds, our strategy guarantees that if an optimal solution exists it will be found. If an optimal solution doesn\u2019t exist, our strategy produces human tractable and deterministic results that provide insights into what went wrong and how it can be fixed.<\/jats:p><jats:p>We apply our tuning strategy to the incremental-flattening transformation supported by the publicly-available Futhark compiler and compare with a previous black-box tuning solution that uses the popular OpenTuner library. We demonstrate the feasibility of our solution on a set of standard datasets of real-world applications and public benchmark suites, such as Rodinia and FinPar. We show that our approach shortens the tuning time by a factor of <jats:inline-formula><jats:alternatives><jats:tex-math>$$6\\times $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                    <mml:mrow>\n                      <mml:mn>6<\/mml:mn>\n                      <mml:mo>\u00d7<\/mml:mo>\n                    <\/mml:mrow>\n                  <\/mml:math><\/jats:alternatives><\/jats:inline-formula> on average, and more importantly, in five out of eleven cases, it produces programs that are (as high as <jats:inline-formula><jats:alternatives><jats:tex-math>$$10\\times $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                    <mml:mrow>\n                      <mml:mn>10<\/mml:mn>\n                      <mml:mo>\u00d7<\/mml:mo>\n                    <\/mml:mrow>\n                  <\/mml:math><\/jats:alternatives><\/jats:inline-formula>) faster than the ones produced by the OpenTuner-based technique.<\/jats:p>","DOI":"10.1007\/978-3-030-83978-9_1","type":"book-chapter","created":{"date-parts":[[2021,8,22]],"date-time":"2021-08-22T23:03:52Z","timestamp":1629673432000},"page":"3-23","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Dataset Sensitive Autotuning of\u00a0Multi-versioned Code Based on\u00a0Monotonic Properties"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9499-199X","authenticated-orcid":false,"given":"Philip","family":"Munksgaard","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Svend Lund","family":"Breddam","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1195-9722","authenticated-orcid":false,"given":"Troels","family":"Henriksen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7093-5803","authenticated-orcid":false,"given":"Fabian Cristian","family":"Gieseke","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5421-6876","authenticated-orcid":false,"given":"Cosmin","family":"Oancea","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,8,23]]},"reference":[{"key":"1_CR1","unstructured":"https:\/\/docs.nvidia.com\/cuda\/"},{"key":"1_CR2","doi-asserted-by":"publisher","unstructured":"Acar, U.A., Aksenov, V., Chargu\u00e9raud, A., Rainey, M.: Provably and practically efficient granularity control. In: PPoPP 2019, pp. 214\u2013228 (2019). https:\/\/doi.org\/10.1145\/3293883.3295725","DOI":"10.1145\/3293883.3295725"},{"key":"1_CR3","doi-asserted-by":"publisher","unstructured":"Andreetta, C., et al.: FinPar: a parallel financial benchmark 13(2) (2016). https:\/\/doi.org\/10.1145\/2898354","DOI":"10.1145\/2898354"},{"key":"1_CR4","doi-asserted-by":"crossref","unstructured":"Ansel, J., et al.: OpenTuner: an extensible framework for program autotuning. In: International Conference on Parallel Architectures and Compilation Techniques (2014). http:\/\/groups.csail.mit.edu\/commit\/papers\/2014\/ansel-pact14-opentuner.pdf","DOI":"10.1145\/2628071.2628092"},{"key":"1_CR5","doi-asserted-by":"crossref","unstructured":"Baghdadi, R., et al.: PENCIL: a platform-neutral compute intermediate language for accelerator programming. In: 2015 PACT, pp. 138\u2013149 (2015)","DOI":"10.1109\/PACT.2015.17"},{"issue":"1","key":"1_CR6","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1006\/jpdc.1994.1038","volume":"21","author":"GE Blelloch","year":"1994","unstructured":"Blelloch, G.E., Hardwick, J.C., Sipelstein, J., Zagha, M., Chatterjee, S.: Implementation of a portable nested data-parallel language. J. Parallel Distrib. Comput. 21(1), 4\u201314 (1994)","journal-title":"J. Parallel Distrib. Comput."},{"key":"1_CR7","doi-asserted-by":"publisher","unstructured":"Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, 2009. IISWC 2009, pp. 44\u201354 (10 2009). https:\/\/doi.org\/10.1109\/IISWC.2009.5306797","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"1_CR8","doi-asserted-by":"publisher","unstructured":"Chen, Y., et al.: Evaluating iterative optimization across 1000 datasets. In: PLDI 2010, pp. 448\u2013459. https:\/\/doi.org\/10.1145\/1806596.1806647","DOI":"10.1145\/1806596.1806647"},{"key":"1_CR9","doi-asserted-by":"publisher","unstructured":"Couto, M., Borba, P., Cunha, J., Fernandes, J.P., Pereira, R., Saraiva, J.: Products go green: worst-case energy consumption in software product lines. In: Proceedings of the 21st International Systems and Software Product Line Conference, SPLC 2017, Volume A, Sevilla, Spain, 25\u201329 September 2017, pp. 84\u201393. https:\/\/doi.org\/10.1145\/3106195.3106214","DOI":"10.1145\/3106195.3106214"},{"key":"1_CR10","doi-asserted-by":"publisher","unstructured":"Dang, F., Yu, H., Rauchwerger, L.: The R-LRPD test: speculative parallelization of partially parallel loops. In: Proceedings 16th International Parallel and Distributed Processing Symposium, pp. 20\u201329 (2002). https:\/\/doi.org\/10.1109\/IPDPS.2002.1015493","DOI":"10.1109\/IPDPS.2002.1015493"},{"key":"1_CR11","doi-asserted-by":"crossref","unstructured":"Elsman, M., Henriksen, T., Annenkov, D., Oancea, C.E.: Static interpretation of higher-order modules in Futhark: functional GPU programming in the large. Proc. ACM Program. Lang. 2(ICFP), 97:1\u201397:30 (2018)","DOI":"10.1145\/3236792"},{"key":"1_CR12","doi-asserted-by":"publisher","first-page":"1935","DOI":"10.1109\/JPROC.2018.2873289","volume":"106","author":"F Franchetti","year":"2018","unstructured":"Franchetti, F., et al.: SPIRAL: extreme performance portability. Proc. IEEE 106, 1935\u20131968 (2018)","journal-title":"Proc. IEEE"},{"key":"1_CR13","doi-asserted-by":"publisher","first-page":"296","DOI":"10.1007\/s10766-010-0161-2","volume":"39","author":"G Fursin","year":"2011","unstructured":"Fursin, G., et al.: Milepost GCC: machine learning enabled self-tuning compiler. Int. J. Parallel Program. 39, 296\u2013327 (2011)","journal-title":"Int. J. Parallel Program."},{"key":"1_CR14","doi-asserted-by":"publisher","unstructured":"Gieseke, F., Rosca, S., Henriksen, T., Verbesselt, J., Oancea, C.E.: Massively-parallel change detection for satellite time series data with missing values. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 385\u2013396 (2020). https:\/\/doi.org\/10.1109\/ICDE48307.2020.00040","DOI":"10.1109\/ICDE48307.2020.00040"},{"key":"1_CR15","doi-asserted-by":"publisher","unstructured":"Hagedorn, B., Stoltzfus, L., Steuwer, M., Gorlatch, S., Dubach, C.: High performance stencil code generation with lift. In: ACM, pp. 100\u2013112 (2018). https:\/\/doi.org\/10.1145\/3168824","DOI":"10.1145\/3168824"},{"key":"1_CR16","doi-asserted-by":"publisher","unstructured":"Henriksen, T., Elsman, M., Oancea, C.E.: Modular acceleration: tricky cases of functional high-performance computing. In: Proceedings of the 7th ACM SIGPLAN International Workshop on Functional High-Performance Computing, pp. 10\u201321. FHPC 2018 (2018). https:\/\/doi.org\/10.1145\/3264738.3264740","DOI":"10.1145\/3264738.3264740"},{"key":"1_CR17","doi-asserted-by":"crossref","unstructured":"Henriksen, T., Hellfritzsch, S., Sadayappan, P., Oancea, C.: Compiling generalized histograms for GPU. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC 2020. IEEE Press (2020)","DOI":"10.1109\/SC41405.2020.00101"},{"key":"1_CR18","doi-asserted-by":"publisher","unstructured":"Henriksen, T., Serup, N.G.W., Elsman, M., Henglein, F., Oancea, C.E.: Futhark: purely functional GPU-programming with nested parallelism and in-place array updates. In: PLDI 2017, pp. 556\u2013571 (2017). https:\/\/doi.org\/10.1145\/3062341.3062354","DOI":"10.1145\/3062341.3062354"},{"key":"1_CR19","doi-asserted-by":"publisher","unstructured":"Henriksen, T., Thor\u00f8e, F., Elsman, M., Oancea, C.: Incremental flattening for nested data parallelism. In: PPoPP 2019, pp. 53\u201367. https:\/\/doi.org\/10.1145\/3293883.3295707","DOI":"10.1145\/3293883.3295707"},{"key":"1_CR20","doi-asserted-by":"publisher","unstructured":"Oancea, C.E., Andreetta, C., Berthold, J., Frisch, A., Henglein, F.: Financial software on GPUs: between Haskell and Fortran. In: Proceedings of the 1st ACM SIGPLAN Workshop on Functional High-Performance Computing, FHPC 2012, pp. 61\u201372 (2012). https:\/\/doi.org\/10.1145\/2364474.2364484","DOI":"10.1145\/2364474.2364484"},{"key":"1_CR21","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1007\/978-3-540-89740-8_11","volume-title":"Languages and Compilers for Parallel Computing","author":"CE Oancea","year":"2008","unstructured":"Oancea, C.E., Mycroft, A.: Set-congruence dynamic analysis for thread-level speculation (TLS). In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 156\u2013171. Springer, Heidelberg (2008). https:\/\/doi.org\/10.1007\/978-3-540-89740-8_11"},{"key":"1_CR22","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1007\/978-3-642-36036-7_5","volume-title":"Languages and Compilers for Parallel Computing","author":"CE Oancea","year":"2013","unstructured":"Oancea, C.E., Rauchwerger, L.: A hybrid approach to proving memory reference monotonicity. In: Rajopadhye, S., Mills Strout, M. (eds.) LCPC 2011. LNCS, vol. 7146, pp. 61\u201375. Springer, Heidelberg (2013). https:\/\/doi.org\/10.1007\/978-3-642-36036-7_5"},{"key":"1_CR23","unstructured":"Oancea, C.E., Selby, J.W.A., Giesbrecht, M., Watt, S.M.: Distributed models of thread-level speculation. In: Proceedings of International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2005), pp. 920\u2013927 (2005)"},{"key":"1_CR24","doi-asserted-by":"publisher","unstructured":"Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: PLDI 2013, pp. 519\u2013530. ACM (2013). https:\/\/doi.org\/10.1145\/2491956.2462176","DOI":"10.1145\/2491956.2462176"},{"key":"1_CR25","doi-asserted-by":"crossref","unstructured":"Rasch, A., Gorlatch, S.: ATF: a generic directive-based auto-tuning framework. Concurrency Comput. Pract. Exp. 31(5), e4423 (2019)","DOI":"10.1002\/cpe.4423"},{"issue":"3","key":"1_CR26","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1023\/A:1024597010150","volume":"31","author":"S Rus","year":"2003","unstructured":"Rus, S., Hoeflinger, J., Rauchwerger, L.: Hybrid analysis: static & dynamic memory reference analysis. Int. J. Parallel Program. 31(3), 251\u2013283 (2003). https:\/\/doi.org\/10.1023\/A:1024597010150","journal-title":"Int. J. Parallel Program."},{"key":"1_CR27","doi-asserted-by":"publisher","unstructured":"Steuwer, M., Fensch, C., Lindley, S., Dubach, C.: Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code. In: ICFP 2015, pp. 205\u2013217. https:\/\/doi.org\/10.1145\/2784731.2784754","DOI":"10.1145\/2784731.2784754"},{"issue":"14","key":"1_CR28","doi-asserted-by":"publisher","first-page":"2367","DOI":"10.1002\/cpe.3302","volume":"26","author":"P Thoman","year":"2014","unstructured":"Thoman, P., Jordan, H., Fahringer, T.: Compiler multiversioning for automatic task granularity control. Concurrency Comput. Pract. Exp. 26(14), 2367\u20132385 (2014). https:\/\/doi.org\/10.1002\/cpe.3302","journal-title":"Concurrency Comput. Pract. Exp."}],"container-title":["Lecture Notes in Computer Science","Trends in Functional Programming"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-030-83978-9_1","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,8,22]],"date-time":"2021-08-22T23:25:34Z","timestamp":1629674734000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-030-83978-9_1"}},"subtitle":["Autotuning in Futhark"],"short-title":[],"issued":{"date-parts":[[2021]]},"ISBN":["9783030839772","9783030839789"],"references-count":28,"URL":"https:\/\/doi.org\/10.1007\/978-3-030-83978-9_1","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"type":"print","value":"0302-9743"},{"type":"electronic","value":"1611-3349"}],"subject":[],"published":{"date-parts":[[2021]]},"assertion":[{"value":"23 August 2021","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"TFP","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Symposium on Trends in Functional Programming","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2021","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"17 February 2021","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"19 February 2021","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"22","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"tfp2021","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"http:\/\/tfp2021.org\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Single-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"EasyChair","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"18","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"6","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"33% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3-4","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Yes","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}