{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,7]],"date-time":"2025-10-07T11:58:03Z","timestamp":1759838283735,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":39,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,6,3]],"date-time":"2021-06-03T00:00:00Z","timestamp":1622678400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Department of Energy, National Nuclear Security Administration","award":["DE-NA0003969"],"award-info":[{"award-number":["DE-NA0003969"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,6,3]]},"DOI":"10.1145\/3447818.3460376","type":"proceedings-article","created":{"date-parts":[[2021,6,4]],"date-time":"2021-06-04T15:09:36Z","timestamp":1622819376000},"page":"467-478","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":18,"title":["A performance portability framework for Python"],"prefix":"10.1145","author":[{"given":"Nader","family":"Al Awar","sequence":"first","affiliation":[{"name":"The University of Texas at Austin"}]},{"given":"Steven","family":"Zhu","sequence":"additional","affiliation":[{"name":"The University of Texas at Austin"}]},{"given":"George","family":"Biros","sequence":"additional","affiliation":[{"name":"The University of Texas at Austin"}]},{"given":"Milos","family":"Gligoric","sequence":"additional","affiliation":[{"name":"The University of Texas at Austin"}]}],"member":"320","published-online":{"date-parts":[[2021,6,4]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2012. MyPy. https:\/\/github.com\/python\/mypy.  2012. MyPy. https:\/\/github.com\/python\/mypy."},{"key":"e_1_3_2_1_2_1","unstructured":"2015. Kokkos Tutorials. https:\/\/github.com\/kokkos\/kokkos-tutorials.  2015. Kokkos Tutorials. https:\/\/github.com\/kokkos\/kokkos-tutorials."},{"key":"e_1_3_2_1_3_1","unstructured":"2016. KokkosP Profiling Tools. https:\/\/github.com\/kokkos\/kokkos-tools.  2016. KokkosP Profiling Tools. https:\/\/github.com\/kokkos\/kokkos-tools."},{"key":"e_1_3_2_1_4_1","unstructured":"2017. ExaMiniMD. https:\/\/github.com\/ECP-copa\/ExaMiniMD.  2017. ExaMiniMD. https:\/\/github.com\/ECP-copa\/ExaMiniMD."},{"key":"e_1_3_2_1_5_1","unstructured":"2020. typing - Support for type hints. https:\/\/docs.python.org\/3\/library\/typing.html.  2020. typing - Support for type hints. https:\/\/docs.python.org\/3\/library\/typing.html."},{"key":"e_1_3_2_1_6_1","volume-title":"USENIX Symposium on Operating Systems Design and Implementation. 265--283","author":"Abadi Martin","year":"2016","unstructured":"Martin Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , Manjunath Kudlur , Josh Levenberg , Rajat Monga , Sherry Moore , Derek G. Murray , Benoit Steiner , Paul Tucker , Vijay Vasudevan , Pete Warden , Martin Wicke , Yuan Yu , and Xiaoqiang Zheng . 2016 . TensorFlow: A system for large-scale machine learning . In USENIX Symposium on Operating Systems Design and Implementation. 265--283 . Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In USENIX Symposium on Operating Systems Design and Implementation. 265--283."},{"key":"e_1_3_2_1_7_1","volume-title":"RAJA: Portable Performance for Large-Scale Scientific Applications. In Workshop on Performance, Portability and Productivity in HPC. 71--81","author":"Beckingsale David A.","year":"2019","unstructured":"David A. Beckingsale , Jason Burmark , Rich Hornung , Holger Jones , William Killian , Adam J. Kunen , Olga Pearce , Peter Robinson , Brian S. Ryujin , and Thomas RW Scogland . 2019 . RAJA: Portable Performance for Large-Scale Scientific Applications. In Workshop on Performance, Portability and Productivity in HPC. 71--81 . David A. Beckingsale, Jason Burmark, Rich Hornung, Holger Jones, William Killian, Adam J. Kunen, Olga Pearce, Peter Robinson, Brian S. Ryujin, and Thomas RW Scogland. 2019. RAJA: Portable Performance for Large-Scale Scientific Applications. In Workshop on Performance, Portability and Productivity in HPC. 71--81."},{"key":"e_1_3_2_1_8_1","volume-title":"Dag Sverre Seljebotn, and Kurt Smith","author":"Behnel Stefan","year":"2011","unstructured":"Stefan Behnel , Robert Bradshaw , Craig Citro , Lisandro Dalcin , Dag Sverre Seljebotn, and Kurt Smith . 2011 . Cython : The Best of Both Worlds. In Computing in Science and Engineering . 31--39. Stefan Behnel, Robert Bradshaw, Craig Citro, Lisandro Dalcin, Dag Sverre Seljebotn, and Kurt Smith. 2011. Cython: The Best of Both Worlds. In Computing in Science and Engineering. 31--39."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1137\/141000671"},{"key":"e_1_3_2_1_10_1","first-page":"1","article-title":"Design, Implementation, and Application of GPU-based Java Bytecode Interpreters. In Conference on Object-Oriented Programming","volume":"177","author":"Celik Ahmet","year":"2019","unstructured":"Ahmet Celik , Pengyu Nie , Christopher J. Rossbach , and Milos Gligoric . 2019 . Design, Implementation, and Application of GPU-based Java Bytecode Interpreters. In Conference on Object-Oriented Programming , Systems, Languages, and Applications. 177 : 1 -- 177 :28. Ahmet Celik, Pengyu Nie, Christopher J. Rossbach, and Milos Gligoric. 2019. Design, Implementation, and Application of GPU-based Java Bytecode Interpreters. In Conference on Object-Oriented Programming, Systems, Languages, and Applications. 177:1--177:28.","journal-title":"Systems, Languages, and Applications."},{"key":"e_1_3_2_1_11_1","volume-title":"Exploiting High-performance Heterogeneous Hardware for Java Programs Using Graal. In International Conference on Managed Languages & Runtimes. 4:1--4:13","author":"Clarkson James","year":"2018","unstructured":"James Clarkson , Juan Fumero , Michail Papadimitriou , Foivos S. Zakkak , Maria Xekalaki , Christos Kotselidis , and Mikel Lujan . 2018 . Exploiting High-performance Heterogeneous Hardware for Java Programs Using Graal. In International Conference on Managed Languages & Runtimes. 4:1--4:13 . James Clarkson, Juan Fumero, Michail Papadimitriou, Foivos S. Zakkak, Maria Xekalaki, Christos Kotselidis, and Mikel Lujan. 2018. Exploiting High-performance Heterogeneous Hardware for Java Programs Using Graal. In International Conference on Managed Languages & Runtimes. 4:1--4:13."},{"key":"e_1_3_2_1_12_1","volume-title":"Boosting Java Performance Using GPGPUs. In International Conference on Architecture of Computing Systems. 59--70","author":"Clarkson James","year":"2017","unstructured":"James Clarkson , Christos Kotselidis , Gavin Brown , and Mikel Luj\u00e1n . 2017 . Boosting Java Performance Using GPGPUs. In International Conference on Architecture of Computing Systems. 59--70 . James Clarkson, Christos Kotselidis, Gavin Brown, and Mikel Luj\u00e1n. 2017. Boosting Java Performance Using GPGPUs. In International Conference on Architecture of Computing Systems. 59--70."},{"key":"e_1_3_2_1_13_1","unstructured":"CudaUVM 2013. Unified Memory in CUDA 6. https:\/\/developer.nvidia.com\/blog\/unified-memory-in-cuda-6.  CudaUVM 2013. Unified Memory in CUDA 6. https:\/\/developer.nvidia.com\/blog\/unified-memory-in-cuda-6."},{"key":"e_1_3_2_1_14_1","unstructured":"CUDAWebPage 2020. CUDA Zone. https:\/\/developer.nvidia.com\/cuda-zone.  CUDAWebPage 2020. CUDA Zone. https:\/\/developer.nvidia.com\/cuda-zone."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46079-6_34"},{"key":"e_1_3_2_1_16_1","volume-title":"Evaluation of Java for General Purpose GPU Computing. In International Conference on Advanced Information Networking and Applications Workshops. 1398--1404","author":"Docampo Jorge","year":"2013","unstructured":"Jorge Docampo , Sabela Ramos , Guillermo L. Taboada , Roberto R. Exp\u00f3sito , Juan Touri\u00f1o , and Ram\u00f3n Doallo . 2013 . Evaluation of Java for General Purpose GPU Computing. In International Conference on Advanced Information Networking and Applications Workshops. 1398--1404 . Jorge Docampo, Sabela Ramos, Guillermo L. Taboada, Roberto R. Exp\u00f3sito, Juan Touri\u00f1o, and Ram\u00f3n Doallo. 2013. Evaluation of Java for General Purpose GPU Computing. In International Conference on Advanced Information Networking and Applications Workshops. 1398--1404."},{"volume-title":"Conference on Programming Language Design and Implementation. 1--12","author":"Dubach Christophe","key":"e_1_3_2_1_17_1","unstructured":"Christophe Dubach , Perry Cheng , Rodric Rabbah , David F. Bacon , and Stephen J. Fink . 2012. Compiling a High-level Language for GPUs: (via Language Support for Architectures and Compilers) . In Conference on Programming Language Design and Implementation. 1--12 . Christophe Dubach, Perry Cheng, Rodric Rabbah, David F. Bacon, and Stephen J. Fink. 2012. Compiling a High-level Language for GPUs: (via Language Support for Architectures and Compilers). In Conference on Programming Language Design and Implementation. 1--12."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2014.07.003"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Max Grossman Shams Imam and Vivek Sarkar. 2015. HJ-OpenCL: Reducing the Gap Between the JVM and Accelerators. In Principles and Practices of Programming on The Java Platform. 2--15.  Max Grossman Shams Imam and Vivek Sarkar. 2015. HJ-OpenCL: Reducing the Gap Between the JVM and Accelerators. In Principles and Practices of Programming on The Java Platform . 2--15.","DOI":"10.1145\/2807426.2807427"},{"key":"e_1_3_2_1_20_1","volume-title":"Effective Performance Portability. In International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 24--36","author":"Harrell Stephen Lien","year":"2018","unstructured":"Stephen Lien Harrell , Joy Kitson , Robert Bird , Simon John Pennycook , Jason Sewall , Douglas Jacobsen , David Neill Asanza , Abaigail Hsu , Hector Carrillo Carrillo , Hessoo Kim , and Robert Robey . 2018 . Effective Performance Portability. In International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 24--36 . Stephen Lien Harrell, Joy Kitson, Robert Bird, Simon John Pennycook, Jason Sewall, Douglas Jacobsen, David Neill Asanza, Abaigail Hsu, Hector Carrillo Carrillo, Hessoo Kim, and Robert Robey. 2018. Effective Performance Portability. In International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 24--36."},{"key":"e_1_3_2_1_21_1","volume-title":"Mark Wiebe, Pearu Peterson, Pierre Gerard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant.","author":"Harris Charles R.","year":"2020","unstructured":"Charles R. Harris , K. Jarrod Millman , Stefan J. van der Walt , Ralf Gommers , Pauli Virtanen , David Cournapeau , Eric Wieser , Julian Taylor , Sebastian Berg , Nathaniel J. Smith , Robert Kern , Matti Picus , Stephan Hoyer , Marten H. van Kerkwijk , Matthew Brett , Allan Haldane , Jaime Fernandez del Rio , Mark Wiebe, Pearu Peterson, Pierre Gerard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020 . Array programming with NumPy. Nature 585, 7825 (2020), 357--362. Charles R. Harris, K. Jarrod Millman, Stefan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernandez del Rio, Mark Wiebe, Pearu Peterson, Pierre Gerard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020. Array programming with NumPy. Nature 585, 7825 (2020), 357--362."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"crossref","unstructured":"Akihiro Hayashi Max Grossman Jisheng Zhao Jun Shirako and Vivek Sarkar. 2013. Accelerating Habanero-Java Programs with OpenCL Generation. In International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines Languages and Tools. 124--134.  Akihiro Hayashi Max Grossman Jisheng Zhao Jun Shirako and Vivek Sarkar. 2013. Accelerating Habanero-Java Programs with OpenCL Generation. In International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines Languages and Tools . 124--134.","DOI":"10.1145\/2500828.2500840"},{"key":"e_1_3_2_1_23_1","volume-title":"DiffTaichi: Differentiable Programming for Physical Simulation. International Conference on Learning Representations","author":"Hu Yuanming","year":"2020","unstructured":"Yuanming Hu , Luke Anderson , Tzu-Mao Li , Qi Sun , Nathan Carr , Jonathan Ragan-Kelley , and Fr\u00e9do Durand . 2020 . DiffTaichi: Differentiable Programming for Physical Simulation. International Conference on Learning Representations (2020). Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Fr\u00e9do Durand. 2020. DiffTaichi: Differentiable Programming for Physical Simulation. International Conference on Learning Representations (2020)."},{"key":"e_1_3_2_1_24_1","unstructured":"Intel. 2013. PRK. https:\/\/github.com\/ParRes\/Kernels.  Intel. 2013. PRK. https:\/\/github.com\/ParRes\/Kernels."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2833157.2833162"},{"key":"e_1_3_2_1_26_1","volume-title":"LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In International Symposium on Code Generation and Optimization. 75--86","author":"Lattner Chris","year":"2004","unstructured":"Chris Lattner and Vikram Adve . 2004 . LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In International Symposium on Code Generation and Optimization. 75--86 . Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In International Symposium on Code Generation and Optimization. 75--86."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2007.58"},{"key":"e_1_3_2_1_28_1","unstructured":"OpenMPWebPage 2020. OpenMP. https:\/\/www.openmp.org.  OpenMPWebPage 2020. OpenMP. https:\/\/www.openmp.org."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983990.2984015"},{"volume-title":"PyTorch: An Imperative Style","author":"Paszke Adam","key":"e_1_3_2_1_30_1","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , Alban Desmaison , Andreas Kopf , Edward Yang , Zachary DeVito , Martin Raison , Alykhan Tejani , Sasank Chilamkurthy , Benoit Steiner , Lu Fang , Junjie Bai , and Soumith Chintala . 2019. PyTorch: An Imperative Style , High-Performance Deep Learning Library . In Advances in Neural Information Processing Systems. 8024--8035. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems. 8024--8035."},{"volume-title":"International Conference on High Performance Computing and Communication. 375--380","author":"Pratt-Szeliga Philip C.","key":"e_1_3_2_1_31_1","unstructured":"Philip C. Pratt-Szeliga , James W. Fawcett , and Roy D. Welch . 2012. Rootbeer: Seamlessly Using GPUs from Java . In International Conference on High Performance Computing and Communication. 375--380 . Philip C. Pratt-Szeliga, James W. Fawcett, and Roy D. Welch. 2012. Rootbeer: Seamlessly Using GPUs from Java. In International Conference on High Performance Computing and Communication. 375--380."},{"key":"e_1_3_2_1_32_1","unstructured":"pybind11 2020. Pybind11 Documentation. https:\/\/pybind11.readthedocs.io\/en\/stable\/intro.html.  pybind11 2020. Pybind11 Documentation. https:\/\/pybind11.readthedocs.io\/en\/stable\/intro.html."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2491956.2462176"},{"key":"e_1_3_2_1_34_1","volume-title":"Kokkos Kernels: Performance Portable Sparse\/Dense Linear Algebra and Graph Kernels. https:\/\/arxiv.org\/abs\/2103.11991. [arxiv]2103.11991 [cs.MS]","author":"Rajamanickam Sivasankaran","year":"2021","unstructured":"Sivasankaran Rajamanickam , Seher Acer , Luc Berger-Vergiat , Vinh Dang , Nathan Ellingwood , Evan Harvey , Brian Kelley , Christian R. Trott , Jeremiah Wilke , and Ichitaro Yamazaki . 2021 . Kokkos Kernels: Performance Portable Sparse\/Dense Linear Algebra and Graph Kernels. https:\/\/arxiv.org\/abs\/2103.11991. [arxiv]2103.11991 [cs.MS] Sivasankaran Rajamanickam, Seher Acer, Luc Berger-Vergiat, Vinh Dang, Nathan Ellingwood, Evan Harvey, Brian Kelley, Christian R. Trott, Jeremiah Wilke, and Ichitaro Yamazaki. 2021. Kokkos Kernels: Performance Portable Sparse\/Dense Linear Algebra and Graph Kernels. https:\/\/arxiv.org\/abs\/2103.11991. [arxiv]2103.11991 [cs.MS]"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.25080\/Majora-7b98e3ed-013"},{"key":"e_1_3_2_1_36_1","first-page":"20601","article-title":"Unsupervised Translation of Programming Languages","volume":"33","author":"Roziere Baptiste","year":"2020","unstructured":"Baptiste Roziere , Marie-Anne Lachaux , Lowik Chanussot , and Guillaume Lample . 2020 . Unsupervised Translation of Programming Languages . In Advances in Neural Information Processing Systems , Vol. 33. 20601 -- 20611 . Baptiste Roziere, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised Translation of Programming Languages. In Advances in Neural Information Processing Systems, Vol. 33. 20601--20611.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_37_1","unstructured":"ShedSkin 2020. Shed Skin. https:\/\/shedskin.github.io.  ShedSkin 2020. Shed Skin. https:\/\/shedskin.github.io."},{"volume-title":"Scalable Task-Based Parallelism with Python. In Parallel Applications Workshop, Alternatives To MPI. 58--72","author":"Slaughter E.","key":"e_1_3_2_1_38_1","unstructured":"E. Slaughter and A. Aiken . 2019. Pygion: Flexible , Scalable Task-Based Parallelism with Python. In Parallel Applications Workshop, Alternatives To MPI. 58--72 . E. Slaughter and A. Aiken. 2019. Pygion: Flexible, Scalable Task-Based Parallelism with Python. In Parallel Applications Workshop, Alternatives To MPI. 58--72."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41592-019-0686-2"}],"event":{"name":"ICS '21: 2021 International Conference on Supercomputing","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture"],"location":"Virtual Event USA","acronym":"ICS '21"},"container-title":["Proceedings of the ACM International Conference on Supercomputing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447818.3460376","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3447818.3460376","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:49:27Z","timestamp":1750268967000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447818.3460376"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,3]]},"references-count":39,"alternative-id":["10.1145\/3447818.3460376","10.1145\/3447818"],"URL":"https:\/\/doi.org\/10.1145\/3447818.3460376","relation":{},"subject":[],"published":{"date-parts":[[2021,6,3]]},"assertion":[{"value":"2021-06-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}