{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,20]],"date-time":"2025-07-20T04:30:59Z","timestamp":1752985859460,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":29,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,11,15]],"date-time":"2020-11-15T00:00:00Z","timestamp":1605398400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,11,17]]},"DOI":"10.1145\/3426422.3426980","type":"proceedings-article","created":{"date-parts":[[2020,11,24]],"date-time":"2020-11-24T21:30:21Z","timestamp":1606253421000},"page":"43-56","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["DelayRepay: delayed execution for kernel fusion in Python"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2974-2767","authenticated-orcid":false,"given":"John Magnus","family":"Morton","sequence":"first","affiliation":[{"name":"University of Edinburgh, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kuba","family":"Kaszyk","sequence":"additional","affiliation":[{"name":"University of Edinburgh, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lu","family":"Li","sequence":"additional","affiliation":[{"name":"University of Edinburgh, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiawen","family":"Sun","sequence":"additional","affiliation":[{"name":"University of Edinburgh, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christophe","family":"Dubach","sequence":"additional","affiliation":[{"name":"McGill University, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michel","family":"Steuwer","sequence":"additional","affiliation":[{"name":"University of Edinburgh, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Murray","family":"Cole","sequence":"additional","affiliation":[{"name":"University of Edinburgh, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1619-5052","authenticated-orcid":false,"given":"Michael F. P.","family":"O'Boyle","sequence":"additional","affiliation":[{"name":"University of Edinburgh, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,11,15]]},"reference":[{"key":"e_1_3_2_2_1_1","first-page":"173","volume-title":"Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015","author":"Ashari Arash","year":"2015","unstructured":"Arash Ashari , Shirish Tatikonda , Matthias Boehm , Berthold Reinwald , Keith Campbell , John Keenleyside , and P. Sadayappan . On optimizing machine learning workloads via kernel fusion. In Albert Cohen and David Grove, editors , Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015 , San Francisco, CA, USA , February 7-11, 2015 , pages 173 - 182 . ACM, 2015. Arash Ashari, Shirish Tatikonda, Matthias Boehm, Berthold Reinwald, Keith Campbell, John Keenleyside, and P. Sadayappan. On optimizing machine learning workloads via kernel fusion. In Albert Cohen and David Grove, editors, Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, San Francisco, CA, USA, February 7-11, 2015, pages 173-182. ACM, 2015."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-25935-0_1"},{"key":"e_1_3_2_2_3_1","first-page":"1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Bauer Michael","year":"2019","unstructured":"Michael Bauer and Michael Garland . Legate NumPy : Accelerated and distributed array computing . In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , pages 1 - 23 , 2019 . Michael Bauer and Michael Garland. Legate NumPy: Accelerated and distributed array computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1-23, 2019."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2012.71"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654119"},{"key":"e_1_3_2_2_6_1","first-page":"4","volume-title":"30th European Conference on Object-Oriented Programming (ECOOP 2016 ), volume 56 of Leibniz International Proceedings in Informatics (LIPIcs)","author":"Bolz Carl Friedrich","year":"2016","unstructured":"Carl Friedrich Bolz , Darya Kurilova , and Laurence Tratt . Making an Embedded DBMS JIT-friendly. In Shriram Krishnamurthi and Benjamin S. Lerner, editors , 30th European Conference on Object-Oriented Programming (ECOOP 2016 ), volume 56 of Leibniz International Proceedings in Informatics (LIPIcs) , pages 4 : 1-4 : 24, Dagstuhl, Germany , 2016 . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. Carl Friedrich Bolz, Darya Kurilova, and Laurence Tratt. Making an Embedded DBMS JIT-friendly. In Shriram Krishnamurthi and Benjamin S. Lerner, editors, 30th European Conference on Object-Oriented Programming (ECOOP 2016 ), volume 56 of Leibniz International Proceedings in Informatics (LIPIcs), pages 4 : 1-4 : 24, Dagstuhl, Germany, 2016. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik."},{"key":"e_1_3_2_2_7_1","first-page":"47","volume-title":"Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2011","author":"Catanzaro Bryan","year":"2011","unstructured":"Bryan Catanzaro , Michael Garland , and Kurt Keutzer . Copperhead : compiling an embedded data parallel language. In Calin Cascaval and Pen-Chung Yew, editors , Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2011 , San Antonio, TX, USA , February 12-16, 2011 , pages 47 - 56 . ACM, 2011. Bryan Catanzaro, Michael Garland, and Kurt Keutzer. Copperhead: compiling an embedded data parallel language. In Calin Cascaval and Pen-Chung Yew, editors, Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2011, San Antonio, TX, USA, February 12-16, 2011, pages 47-56. ACM, 2011."},{"key":"e_1_3_2_2_8_1","volume-title":"Int. J. Parallel Program., 21 ( 5 ): 313-347","author":"Feautrier Paul","year":"1992","unstructured":"Paul Feautrier . Some eficient solutions to the afine scheduling problem. i. one-dimensional time . Int. J. Parallel Program., 21 ( 5 ): 313-347 , 1992 . Paul Feautrier. Some eficient solutions to the afine scheduling problem. i. one-dimensional time. Int. J. Parallel Program., 21 ( 5 ): 313-347, 1992."},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313808.3313819"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3050748.3050761"},{"key":"e_1_3_2_2_11_1","first-page":"1","volume-title":"Proceedings of the 14th International Conference on Modularity, MODULARITY 2015","author":"Grimmer Matthias","year":"2015","unstructured":"Matthias Grimmer , Chris Seaton , Thomas W\u00fcrthinger , and Hanspeter M\u00f6ssenb\u00f6ck . Dynamically composing languages in a modular way: supporting C extensions for dynamic languages. In Robert B. France, Sudipto Ghosh, and Gary T. Leavens, editors , Proceedings of the 14th International Conference on Modularity, MODULARITY 2015 , Fort Collins, CO, USA , March 16-19, 2015 , pages 1 - 13 . ACM, 2015. Matthias Grimmer, Chris Seaton, Thomas W\u00fcrthinger, and Hanspeter M\u00f6ssenb\u00f6ck. Dynamically composing languages in a modular way: supporting C extensions for dynamic languages. In Robert B. France, Sudipto Ghosh, and Gary T. Leavens, editors, Proceedings of the 14th International Conference on Modularity, MODULARITY 2015, Fort Collins, CO, USA, March 16-19, 2015, pages 1-13. ACM, 2015."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3359619.3359743"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.5555\/645671.665526"},{"key":"e_1_3_2_2_14_1","volume-title":"PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation. Parallel Computing, 38 ( 3 ): 157-174","author":"Kl\u00f6ckner Andreas","year":"2012","unstructured":"Andreas Kl\u00f6ckner , Nicolas Pinto , Yunsup Lee , B. Catanzaro , Paul Ivanov , and Ahmed Fasih . PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation. Parallel Computing, 38 ( 3 ): 157-174 , 2012 . Andreas Kl\u00f6ckner, Nicolas Pinto, Yunsup Lee, B. Catanzaro, Paul Ivanov, and Ahmed Fasih. PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation. Parallel Computing, 38 ( 3 ): 157-174, 2012."},{"key":"e_1_3_2_2_15_1","unstructured":"Mads RB Kristensen Simon AF Lund Troels Blum Kenneth Skovhede and Brian Vinter. Bohrium: Unmodified NumPy code on CPU GPU and cluster.  Mads RB Kristensen Simon AF Lund Troels Blum Kenneth Skovhede and Brian Vinter. Bohrium: Unmodified NumPy code on CPU GPU and cluster."},{"key":"e_1_3_2_2_16_1","first-page":"71","volume-title":"Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT 2016","author":"Burgdorf Kristensen Mads Ruben","year":"2016","unstructured":"Mads Ruben Burgdorf Kristensen , Simon Andreas Frimann Lund , Troels Blum , and James Avery . Fusion of parallel array operations. In Ayal Zaks, Bilha Mendelson, Lawrence Rauchwerger, and Wenmei W. Hwu, editors , Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT 2016 , Haifa, Israel , September 11-15, 2016 , pages 71 - 85 . ACM, 2016. Mads Ruben Burgdorf Kristensen, Simon Andreas Frimann Lund, Troels Blum, and James Avery. Fusion of parallel array operations. In Ayal Zaks, Bilha Mendelson, Lawrence Rauchwerger, and Wenmei W. Hwu, editors, Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT 2016, Haifa, Israel, September 11-15, 2016, pages 71-85. ACM, 2016."},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2833157.2833162"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2500365.2500595"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.25080\/Majora-92bf1922-00a"},{"key":"e_1_3_2_2_20_1","volume-title":"CuPy: A NumPycompatible library for NVIDIA GPU calculations. 31st Confernce on Neural Information Processing Systems, page 151","author":"Nishino Royud","year":"2017","unstructured":"Royud Nishino and Shohei Hido Crissman Loomis . CuPy: A NumPycompatible library for NVIDIA GPU calculations. 31st Confernce on Neural Information Processing Systems, page 151 , 2017 . Royud Nishino and Shohei Hido Crissman Loomis. CuPy: A NumPycompatible library for NVIDIA GPU calculations. 31st Confernce on Neural Information Processing Systems, page 151, 2017."},{"key":"e_1_3_2_2_21_1","first-page":"45","volume-title":"Conference on Innovative Data Systems Research (CIDR)","author":"Palkar Shoumik","year":"2017","unstructured":"Shoumik Palkar , James J Thomas , Anil Shanbhag , Deepak Narayanan , Holger Pirk , Malte Schwarzkopf , Saman Amarasinghe , Matei Zaharia , and Stanford InfoLab . Weld : A common runtime for high performance data analytics . In Conference on Innovative Data Systems Research (CIDR) , page 45 , 2017 . Shoumik Palkar, James J Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, Matei Zaharia, and Stanford InfoLab. Weld: A common runtime for high performance data analytics. In Conference on Innovative Data Systems Research (CIDR), page 45, 2017."},{"key":"e_1_3_2_2_22_1","volume-title":"Just-in-time Acceleration of JavaScript. Technical report","author":"Pitambare Uday","year":"2013","unstructured":"Uday Pitambare , Arun Chauhan , and Saurabh Malviya . Just-in-time Acceleration of JavaScript. Technical report , 2013 . Uday Pitambare, Arun Chauhan, and Saurabh Malviya. Just-in-time Acceleration of JavaScript. Technical report, 2013."},{"key":"e_1_3_2_2_23_1","volume-title":"December","author":"Rathgeber Florian","year":"2016","unstructured":"Florian Rathgeber , David A. Ham , Lawrence Mitchell , Michael Lange , Fabio Luporini , Andrew T. T. Mcrae , Gheorghe-Teodor Bercea , Graham R. Markall , and Paul H. J. Kelly . Firedrake: Automating the finite element method by composing abstractions. ACM Transactions on Mathematical Software, 43 ( 3 ) , December 2016 . Florian Rathgeber, David A. Ham, Lawrence Mitchell, Michael Lange, Fabio Luporini, Andrew T. T. Mcrae, Gheorghe-Teodor Bercea, Graham R. Markall, and Paul H. J. Kelly. Firedrake: Automating the finite element method by composing abstractions. ACM Transactions on Mathematical Software, 43 ( 3 ), December 2016."},{"key":"e_1_3_2_2_24_1","volume-title":"Single assignment C: eficient support for high-level array operations in a functional setting. J. Funct. Program., 13 ( 6 ): 1005-1059","author":"Scholz Sven-Bodo","year":"2003","unstructured":"Sven-Bodo Scholz . Single assignment C: eficient support for high-level array operations in a functional setting. J. Funct. Program., 13 ( 6 ): 1005-1059 , 2003 . Sven-Bodo Scholz. Single assignment C: eficient support for high-level array operations in a functional setting. J. Funct. Program., 13 ( 6 ): 1005-1059, 2003."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/INTERACT.2011.18"},{"key":"e_1_3_2_2_26_1","volume-title":"April","author":"Sujeeth Arvind K.","year":"2014","unstructured":"Arvind K. Sujeeth , Kevin J. Brown , Hyoukjoong Lee , Tiark Rompf , Hassan Chafi , Martin Odersky , and Kunle Olukotun . Delite: A compiler architecture for performance-oriented embedded domain-specific languages. ACM Transactions on Embedded Computing Systems, 13 ( 4s ) , April 2014 . Arvind K. Sujeeth, Kevin J. Brown, Hyoukjoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. Delite: A compiler architecture for performance-oriented embedded domain-specific languages. ACM Transactions on Embedded Computing Systems, 13 ( 4s ), April 2014."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"crossref","unstructured":"Pauli Virtanen Ralf Gommers Travis E. Oliphant Matt Haberland Tyler Reddy David Cournapeau Evgeni Burovski Pearu Peterson Warren Weckesser Jonathan Bright St\u00e9fan J. van der Walt Matthew Brett Joshua Wilson K. Jarrod Millman Nikolay Mayorov Andrew R. J. Nelson Eric Jones Robert Kern Eric Larson CJ Carey \u0130lhan Polat Yu Feng Eric W. Moore Jake Vand erPlas Denis Laxalde Josef Perktold Robert Cimrman Ian Henriksen E. A. Quintero Charles R Harris Anne M. Archibald Ant\u00f4nio H. Ribeiro Fabian Pedregosa Paul van Mulbregt and SciPy 1. 0 Contributors. SciPy 1. 0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 2020.  Pauli Virtanen Ralf Gommers Travis E. Oliphant Matt Haberland Tyler Reddy David Cournapeau Evgeni Burovski Pearu Peterson Warren Weckesser Jonathan Bright St\u00e9fan J. van der Walt Matthew Brett Joshua Wilson K. Jarrod Millman Nikolay Mayorov Andrew R. J. Nelson Eric Jones Robert Kern Eric Larson CJ Carey \u0130lhan Polat Yu Feng Eric W. Moore Jake Vand erPlas Denis Laxalde Josef Perktold Robert Cimrman Ian Henriksen E. A. Quintero Charles R Harris Anne M. Archibald Ant\u00f4nio H. Ribeiro Fabian Pedregosa Paul van Mulbregt and SciPy 1. 0 Contributors. SciPy 1. 0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 2020.","DOI":"10.1038\/s41592-020-0772-5"},{"key":"e_1_3_2_2_28_1","first-page":"344","volume-title":"Physical and Social Computing","author":"Wang Guibin","year":"2010","unstructured":"Guibin Wang , YiSong Lin , and Wei Yi. Kernel fusion : An efective method for better power eficiency on multithreaded GPU. In 2010 IEEE\/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber , Physical and Social Computing , pages 344 - 350 . IEEE, 2010 . Guibin Wang, YiSong Lin, and Wei Yi. Kernel fusion: An efective method for better power eficiency on multithreaded GPU. In 2010 IEEE\/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing, pages 344-350. IEEE, 2010."},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2012.300"}],"event":{"name":"SPLASH '20: Conference on Systems, Programming, Languages, and Applications, Software for Humanity","acronym":"SPLASH '20","location":"Virtual USA"},"container-title":["Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3426422.3426980","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3426422.3426980","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:31:33Z","timestamp":1750195893000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3426422.3426980"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,15]]},"references-count":29,"alternative-id":["10.1145\/3426422.3426980","10.1145\/3426422"],"URL":"https:\/\/doi.org\/10.1145\/3426422.3426980","relation":{},"subject":[],"published":{"date-parts":[[2020,11,15]]},"assertion":[{"value":"2020-11-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}