{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T04:55:33Z","timestamp":1769230533655,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":52,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,2,17]],"date-time":"2021-02-17T00:00:00Z","timestamp":1613520000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,2,17]]},"DOI":"10.1145\/3437801.3441587","type":"proceedings-article","created":{"date-parts":[[2021,2,20]],"date-time":"2021-02-20T23:04:20Z","timestamp":1613862260000},"page":"105-118","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Scaling implicit parallelism via dynamic control replication"],"prefix":"10.1145","author":[{"given":"Michael","family":"Bauer","sequence":"first","affiliation":[{"name":"NVIDIA"}]},{"given":"Wonchan","family":"Lee","sequence":"additional","affiliation":[{"name":"NVIDIA"}]},{"given":"Elliott","family":"Slaughter","sequence":"additional","affiliation":[{"name":"SLAC National Accelerator Laboratory"}]},{"given":"Zhihao","family":"Jia","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University"}]},{"given":"Mario","family":"Di Renzo","sequence":"additional","affiliation":[{"name":"Sapienza University of Rome"}]},{"given":"Manolis","family":"Papadakis","sequence":"additional","affiliation":[{"name":"NVIDIA"}]},{"given":"Galen","family":"Shipman","sequence":"additional","affiliation":[{"name":"Los Alamos National Laboratory"}]},{"given":"Patrick","family":"McCormick","sequence":"additional","affiliation":[{"name":"Los Alamos National Laboratory"}]},{"given":"Michael","family":"Garland","sequence":"additional","affiliation":[{"name":"NVIDIA"}]},{"given":"Alex","family":"Aiken","sequence":"additional","affiliation":[{"name":"Stanford University"}]}],"member":"320","published-online":{"date-parts":[[2021,2,17]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2013. OpenMP Application Program Interface. http:\/\/www.openmp.org\/wp-content\/uploads\/OpenMP4.0.0.pdf.  2013. OpenMP Application Program Interface. http:\/\/www.openmp.org\/wp-content\/uploads\/OpenMP4.0.0.pdf."},{"key":"e_1_3_2_1_2_1","unstructured":"2013. Safe Object Finalization in Python. https:\/\/www.python.org\/dev\/peps\/pep-0442\/.  2013. Safe Object Finalization in Python. https:\/\/www.python.org\/dev\/peps\/pep-0442\/."},{"key":"e_1_3_2_1_3_1","unstructured":"2019. CANDLE: Exascale Deep Learning and Simulation Enabled Precision Medicine for Cancer. https:\/\/candle.cels.anl.gov\/.  2019. CANDLE: Exascale Deep Learning and Simulation Enabled Precision Medicine for Cancer. https:\/\/candle.cels.anl.gov\/."},{"key":"e_1_3_2_1_4_1","unstructured":"2019. Uno: Predicting Tumor Dose Response across Multiple Data Sources. https:\/\/github.com\/ECP-CANDLE\/Benchmarks\/tree\/master\/Pilot1\/Uno.  2019. Uno: Predicting Tumor Dose Response across Multiple Data Sources. https:\/\/github.com\/ECP-CANDLE\/Benchmarks\/tree\/master\/Pilot1\/Uno."},{"key":"e_1_3_2_1_5_1","unstructured":"2020. June 2020 Top 500 Supercomputers. https:\/\/www.top500.org\/lists\/top500\/2020\/06\/.  2020. June 2020 Top 500 Supercomputers. https:\/\/www.top500.org\/lists\/top500\/2020\/06\/."},{"key":"e_1_3_2_1_6_1","unstructured":"2020. Regent Stencil Example. https:\/\/gitlab.com\/StanfordLegion\/legion\/-\/blob\/master\/language\/examples\/stencil.rg.  2020. Regent Stencil Example. https:\/\/gitlab.com\/StanfordLegion\/legion\/-\/blob\/master\/language\/examples\/stencil.rg."},{"key":"e_1_3_2_1_7_1","unstructured":"Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dandelion Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https:\/\/www.tensorflow.org\/ Software available from tensorflow.org.  Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dandelion Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https:\/\/www.tensorflow.org\/ Software available from tensorflow.org."},{"key":"e_1_3_2_1_9_1","volume-title":"Barrier Inference. In Proceedings of the Symposium on Principles of Programming Languages. 342--354","author":"Aiken Alex","year":"1998","unstructured":"Alex Aiken and David Gay . 1998 . Barrier Inference. In Proceedings of the Symposium on Principles of Programming Languages. 342--354 . Alex Aiken and David Gay. 1998. Barrier Inference. In Proceedings of the Symposium on Principles of Programming Languages. 342--354."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356175"},{"key":"e_1_3_2_1_11_1","volume-title":"Legion: Expressing Locality and Independence with Logical Regions. In Supercomputing (SC).","author":"Bauer M.","year":"2012","unstructured":"M. Bauer , S. Treichler , E. Slaughter , and A. Aiken . 2012 . Legion: Expressing Locality and Independence with Logical Regions. In Supercomputing (SC). M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. 2012. Legion: Expressing Locality and Independence with Logical Regions. In Supercomputing (SC)."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.74"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1640089.1640097"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2013.98"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2464996.2465017"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342007078442"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1094811.1094852"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2020.107262"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1188455.1188543"},{"key":"e_1_3_2_1_21_1","unstructured":"Charles Ferenbaugh. 2016. The PENNANT Mini-App. https:\/\/github.com\/lanl\/PENNANT\/blob\/master\/doc\/pennantdoc.pdf.  Charles Ferenbaugh. 2016. The PENNANT Mini-App. https:\/\/github.com\/lanl\/PENNANT\/blob\/master\/doc\/pennantdoc.pdf."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356205"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2015.25"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3148226.3148233"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2017.00043"},{"key":"e_1_3_2_1_27_1","volume-title":"SysML","author":"Jia Zhihao","year":"2018","unstructured":"Zhihao Jia , Matei Zaharia , and Alex Aiken . 2018. Beyond Data and Model Parallelism for Deep Neural Networks . In SysML 2018 . Zhihao Jia, Matei Zaharia, and Alex Aiken. 2018. Beyond Data and Model Parallelism for Deep Neural Networks. In SysML 2018."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2676870.2676883"},{"key":"e_1_3_2_1_29_1","volume-title":"Proceedings of OOPSLA'93","author":"Kal\u00e9 L.V.","unstructured":"L.V. Kal\u00e9 and S. Krishnan . 1993. CHARM++: A Portable Concurrent Object Oriented System Based on C++ . In Proceedings of OOPSLA'93 , A. Paepcke (Ed.). ACM Press, 91--108. L.V. Kal\u00e9 and S. Krishnan. 1993. CHARM++: A Portable Concurrent Object Oriented System Based on C++. In Proceedings of OOPSLA'93, A. Paepcke (Ed.). ACM Press, 91--108."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356199"},{"key":"e_1_3_2_1_31_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis","author":"Lee Wonchan","year":"2018","unstructured":"Wonchan Lee , Elliott Slaughter , Michael Bauer , Sean Treichler , Todd Warszawski , Michael Garland , and Alex Aiken . 2018 . Dynamic Tracing: Memoization of Task Graphs for Dynamic Task-based Runtimes . In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis ( Dallas, Texas) (SC '18). IEEE Press, Piscataway, NJ, USA, Article 34, 13 pages. http:\/\/dl.acm.org\/citation.cfm?id=3291656.3291702 Wonchan Lee, Elliott Slaughter, Michael Bauer, Sean Treichler, Todd Warszawski, Michael Garland, and Alex Aiken. 2018. Dynamic Tracing: Memoization of Task Graphs for Dynamic Task-based Runtimes. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (Dallas, Texas) (SC '18). IEEE Press, Piscataway, NJ, USA, Article 34, 13 pages. http:\/\/dl.acm.org\/citation.cfm?id=3291656.3291702"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3062341.3062385"},{"key":"e_1_3_2_1_33_1","volume-title":"Execution Templates: Caching Control Plane Decisions for Strong Scaling of Data Analytics. In USENIX Annual Technical Conference (USENIX ATC).","author":"Mashayekhi Omid","year":"2017","unstructured":"Omid Mashayekhi , Hang Qu , Chinmayee Shah , and Philip Levis . 2017 . Execution Templates: Caching Control Plane Decisions for Strong Scaling of Data Analytics. In USENIX Annual Technical Conference (USENIX ATC). Omid Mashayekhi, Hang Qu, Chinmayee Shah, and Philip Levis. 2017. Execution Templates: Caching Control Plane Decisions for Strong Scaling of Data Analytics. In USENIX Annual Technical Conference (USENIX ATC)."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2016.7761580"},{"key":"e_1_3_2_1_35_1","volume-title":"Ray: A Distributed Framework for Emerging AI Applications. CoRR abs\/1712.05889","author":"Moritz Philipp","year":"2017","unstructured":"Philipp Moritz , Robert Nishihara , Stephanie Wang , Alexey Tumanov , Richard Liaw , Eric Liang , William Paul , Michael I. Jordan , and Ion Stoica . 2017 . Ray: A Distributed Framework for Emerging AI Applications. CoRR abs\/1712.05889 (2017). arXiv:1712.05889 http:\/\/arxiv.org\/abs\/1712.05889 Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, William Paul, Michael I. Jordan, and Ion Stoica. 2017. Ray: A Distributed Framework for Emerging AI Applications. CoRR abs\/1712.05889 (2017). arXiv:1712.05889 http:\/\/arxiv.org\/abs\/1712.05889"},{"key":"e_1_3_2_1_36_1","unstructured":"NumPy 2019. NumPy v1.16 Manual. https:\/\/docs.scipy.org\/doc\/numpy\/.  NumPy 2019. NumPy v1.16 Manual. https:\/\/docs.scipy.org\/doc\/numpy\/."},{"key":"e_1_3_2_1_37_1","unstructured":"NVIDIA 2019. GPUDirect. https:\/\/developer.nvidia.com\/gpudirect.  NVIDIA 2019. GPUDirect. https:\/\/developer.nvidia.com\/gpudirect."},{"key":"e_1_3_2_1_38_1","volume-title":"Automatic Differentiation in PyTorch. In NIPS Autodiff Workshop.","author":"Paszke Adam","year":"2017","unstructured":"Adam Paszke , Sam Gross , Soumith Chintala , Gregory Chanan , Edward Yang , Zachary DeVito , Zeming Lin , Alban Desmaison , Luca Antiga , and Adam Lerer . 2017 . Automatic Differentiation in PyTorch. In NIPS Autodiff Workshop. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic Differentiation in PyTorch. In NIPS Autodiff Workshop."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.25080\/Majora-7b98e3ed-013"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063405"},{"key":"e_1_3_2_1_41_1","volume-title":"Horovod: fast and easy distributed deep learning in TensorFlow. CoRR abs\/1802.05799","author":"Sergeev Alexander","year":"2018","unstructured":"Alexander Sergeev and Mike Del Balso . 2018. Horovod: fast and easy distributed deep learning in TensorFlow. CoRR abs\/1802.05799 ( 2018 ). http:\/\/arxiv.org\/abs\/1802.05799 Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. CoRR abs\/1802.05799 (2018). http:\/\/arxiv.org\/abs\/1802.05799"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807629"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126949"},{"key":"e_1_3_2_1_44_1","volume-title":"Proceedings of the International Conference on Supercomputing.","author":"Slaughter E.","unstructured":"E. Slaughter , W. Wu , Y. Fu , L. Brandenburg , N. Garcia , E. Marx , K.S. Morris , Q. Cao , G. Bosilca , S. Mirchandaney , W. Lee , S. Treichler , P. McCormick , and A. Aiken . 2020. Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance . In Proceedings of the International Conference on Supercomputing. E. Slaughter, W. Wu, Y. Fu, L. Brandenburg, N. Garcia, E. Marx, K.S. Morris, Q. Cao, G. Bosilca, S. Mirchandaney, W. Lee, S. Treichler, P. McCormick, and A. Aiken. 2020. Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance. In Proceedings of the International Conference on Supercomputing."},{"key":"e_1_3_2_1_45_1","unstructured":"M. Snir S. Otto S. Huss-Lederman D. Walker and J. Dongarra. 1998. MPI-The Complete Reference. MIT Press.  M. Snir S. Otto S. Huss-Lederman D. Walker and J. Dongarra. 1998. MPI-The Complete Reference. MIT Press."},{"key":"e_1_3_2_1_46_1","unstructured":"The HDF Group. 1997--2020. Hierarchical Data Format version 5. http:\/\/www.hdfgroup.org\/HDF5\/.  The HDF Group. 1997--2020. Hierarchical Data Format version 5. http:\/\/www.hdfgroup.org\/HDF5\/."},{"key":"e_1_3_2_1_47_1","volume-title":"Proceedings of PAW@SC 2019: Parallel Applications Workshop, Held in conjunction with SC19: The International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Torres Hilario","year":"2019","unstructured":"Hilario Torres , Manolis Papadakis , Lluis Jofre , Wonchan Lee , Alex Aiken , and Gianluca Iaccarino . 2019 . Soleil-X: Turbulence, Particles, and Radiation in the Regent Programming Language . In Proceedings of PAW@SC 2019: Parallel Applications Workshop, Held in conjunction with SC19: The International Conference for High Performance Computing, Networking, Storage and Analysis , Denver, Colorado, USA , November 16-22, 2019. ACM. Hilario Torres, Manolis Papadakis, Lluis Jofre, Wonchan Lee, Alex Aiken, and Gianluca Iaccarino. 2019. Soleil-X: Turbulence, Particles, and Radiation in the Regent Programming Language. In Proceedings of PAW@SC 2019: Parallel Applications Workshop, Held in conjunction with SC19: The International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, Colorado, USA, November 16-22, 2019. ACM."},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628084"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"crossref","unstructured":"S. Treichler M. Bauer and A. Aiken. 2013. Language Support for Dynamic Hierarchical Data Partitioning. In Object Oriented Programming Systems Languages and Applications (OOPSLA).  S. Treichler M. Bauer and A. Aiken. 2013. Language Support for Dynamic Hierarchical Data Partitioning. In Object Oriented Programming Systems Languages and Applications (OOPSLA).","DOI":"10.1145\/2509136.2509545"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"crossref","unstructured":"S. Treichler M. Bauer Sharma R. Slaughter E. and A. Aiken. 2016. Dependent Partitioning. In Object Oriented Programming Systems Languages and Applications (OOPSLA).  S. Treichler M. Bauer Sharma R. Slaughter E. and A. Aiken. 2016. Dependent Partitioning. In Object Oriented Programming Systems Languages and Applications (OOPSLA).","DOI":"10.1145\/2983990.2984016"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/1278177.1278183"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3190508.3190551"},{"key":"e_1_3_2_1_53_1","volume-title":"Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation","author":"Zaharia Matei","year":"2012","unstructured":"Matei Zaharia , Mosharaf Chowdhury , Tathagata Das , Ankur Dave , Justin Ma , Murphy McCauley , Michael J. Franklin , Scott Shenker , and Ion Stoica . 2012 . Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing . In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation ( San Jose, CA) (NSDI'12). USENIX Association, Berkeley, CA, USA, 2--2. http:\/\/dl.acm.org\/citation.cfm?id=2228298.2228301 Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (San Jose, CA) (NSDI'12). USENIX Association, Berkeley, CA, USA, 2--2. http:\/\/dl.acm.org\/citation.cfm?id=2228298.2228301"}],"event":{"name":"PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","location":"Virtual Event Republic of Korea","acronym":"PPoPP '21","sponsor":["SIGPLAN ACM Special Interest Group on Programming Languages","SIGHPC ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing"]},"container-title":["Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3437801.3441587","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3437801.3441587","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:25Z","timestamp":1750191445000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3437801.3441587"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,17]]},"references-count":52,"alternative-id":["10.1145\/3437801.3441587","10.1145\/3437801"],"URL":"https:\/\/doi.org\/10.1145\/3437801.3441587","relation":{},"subject":[],"published":{"date-parts":[[2021,2,17]]},"assertion":[{"value":"2021-02-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}