{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T23:28:44Z","timestamp":1777937324531,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":37,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,4,13]],"date-time":"2019-04-13T00:00:00Z","timestamp":1555113600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,4,13]]},"DOI":"10.1145\/3300053.3319419","type":"proceedings-article","created":{"date-parts":[[2019,4,10]],"date-time":"2019-04-10T19:07:28Z","timestamp":1554923248000},"page":"43-52","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Characterizing CUDA Unified Memory (UM)-Aware MPI Designs on Modern GPU Architectures"],"prefix":"10.1145","author":[{"given":"K. V.","family":"Manian","sequence":"first","affiliation":[{"name":"The Ohio State University Columbus, Ohio"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"A. A.","family":"Ammar","sequence":"additional","affiliation":[{"name":"The Ohio State University Columbus, Ohio"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"A.","family":"Ruhela","sequence":"additional","affiliation":[{"name":"The Ohio State University Columbus, Ohio"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"C.-H.","family":"Chu","sequence":"additional","affiliation":[{"name":"The Ohio State University Columbus, Ohio"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"H.","family":"Subramoni","sequence":"additional","affiliation":[{"name":"The Ohio State University Columbus, Ohio"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"D. K.","family":"Panda","sequence":"additional","affiliation":[{"name":"The Ohio State University Columbus, Ohio"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,4,13]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2008.01.047"},{"key":"e_1_3_2_1_2_1","volume-title":"Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety, Stanis\u0142taw Kozielski, Dariusz Mrozek, Pawe\u0142t Kasprowski, Bo\u017cena Ma\u0142tysiak-Mrozek","author":"Arefyeva Iya","unstructured":"Iya Arefyeva , David Broneske , Gabriel Campero , Marcus Pinnecke , and Gunter Saake . 2018. Memory Management Strategies in CPU\/GPU Database Systems: A Survey . In Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety, Stanis\u0142taw Kozielski, Dariusz Mrozek, Pawe\u0142t Kasprowski, Bo\u017cena Ma\u0142tysiak-Mrozek , and Daniel Kostrzewa (Eds.). Springer International Publishing , Cham , 128--142. Iya Arefyeva, David Broneske, Gabriel Campero, Marcus Pinnecke, and Gunter Saake. 2018. Memory Management Strategies in CPU\/GPU Database Systems: A Survey. In Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety, Stanis\u0142taw Kozielski, Dariusz Mrozek, Pawe\u0142t Kasprowski, Bo\u017cena Ma\u0142tysiak-Mrozek, and Daniel Kostrzewa (Eds.). Springer International Publishing, Cham, 128--142."},{"key":"e_1_3_2_1_3_1","volume-title":"25th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC).","author":"Awan A.","unstructured":"A. Awan , C. Chu , H. Subramoni , X. Lu , and D. Panda . 2018. OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training . In 25th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC). A. Awan, C. Chu, H. Subramoni, X. Lu, and D. Panda. 2018. OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training. In 25th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)."},{"key":"e_1_3_2_1_4_1","volume-title":"Panda","author":"Awan Ammar Ahmad","year":"2018","unstructured":"Ammar Ahmad Awan , Ching-Hsiang Chu , Hari Subramoni , and Dhabaleswar K . Panda . 2018 . Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?. In EuroMPI. Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, and Dhabaleswar K. Panda. 2018. Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?. In EuroMPI."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3018743.3018769"},{"key":"e_1_3_2_1_6_1","volume-title":"2017 8th International Conference on Information and Communication Systems (ICICS).","author":"Balhaf K.","unstructured":"K. Balhaf , M. A. Alsmirat , M. Al-Ayyoub , Y. Jararweh , and M. A. Shehab . 2017. Accelerating Levenshtein and Damerau edit distance algorithms using GPU with unified memory . In 2017 8th International Conference on Information and Communication Systems (ICICS). K. Balhaf, M. A. Alsmirat, M. Al-Ayyoub, Y. Jararweh, and M. A. Shehab. 2017. Accelerating Levenshtein and Damerau edit distance algorithms using GPU with unified memory. In 2017 8th International Conference on Information and Communication Systems (ICICS)."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2884045.2884050"},{"key":"e_1_3_2_1_8_1","volume-title":"http:\/\/www.cosmo-model.org\/content\/model\/general\/default.htm Accessed","author":"Model COSMO.","year":"2019","unstructured":"COSMO. 2019. COSMO- Model . http:\/\/www.cosmo-model.org\/content\/model\/general\/default.htm Accessed : March 26, 2019 . COSMO. 2019. COSMO-Model. http:\/\/www.cosmo-model.org\/content\/model\/general\/default.htm Accessed: March 26, 2019."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3092255.3092256"},{"key":"e_1_3_2_1_10_1","volume-title":"Proceedings, 11th European PVM\/MPI Users' Group Meeting","author":"Gabriel Edgar","unstructured":"Edgar Gabriel , Graham E. Fagg , George Bosilca , Thara Angskun , Jack J. Dongarra , Jeffrey M. Squyres , Vishal Sahay , Prabhanjan Kambadur , Brian Barrett , Andrew Lumsdaine , Ralph H. Castain , David J. Daniel , Richard L. Graham , and Timothy S. Woodall . 2004. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation . In Proceedings, 11th European PVM\/MPI Users' Group Meeting . Budapest, Hungary, 97--104. Edgar Gabriel, GrahamE. Fagg, George Bosilca, Thara Angskun, Jack J. Dongarra, Jeffrey M. Squyres, Vishal Sahay, Prabhanjan Kambadur, Brian Barrett, Andrew Lumsdaine, Ralph H. Castain, David J. Daniel, Richard L. Graham, and Timothy S. Woodall. 2004. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In Proceedings, 11th European PVM\/MPI Users' Group Meeting. Budapest, Hungary, 97--104."},{"key":"e_1_3_2_1_11_1","volume-title":"2016 IEEE 23rd International Conference on High Performance Computing (HiPC). 52--61","author":"Hamidouche K.","unstructured":"K. Hamidouche , A. A. Awan , A. Venkatesh , and D. K. Panda . 2016. CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC . In 2016 IEEE 23rd International Conference on High Performance Computing (HiPC). 52--61 . K. Hamidouche, A. A. Awan, A. Venkatesh, and D. K. Panda. 2016. CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC. In 2016 IEEE 23rd International Conference on High Performance Computing (HiPC). 52--61."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-017-2091-x"},{"key":"e_1_3_2_1_13_1","volume-title":"An Introduction to CUDA-Aware MPI. https:\/\/devblogs.nvidia.com\/introduction-cuda-aware-mpi\/ Accessed","author":"Kraus Jiri","year":"2019","unstructured":"Jiri Kraus . 2013. An Introduction to CUDA-Aware MPI. https:\/\/devblogs.nvidia.com\/introduction-cuda-aware-mpi\/ Accessed : March 26, 2019 . Jiri Kraus. 2013. An Introduction to CUDA-Aware MPI. https:\/\/devblogs.nvidia.com\/introduction-cuda-aware-mpi\/ Accessed: March 26, 2019."},{"key":"e_1_3_2_1_14_1","volume-title":"2014 IEEE High Performance Extreme Computing Conference (HPEC). 1--6.","author":"Landaverde R.","unstructured":"R. Landaverde , Tiansheng Zhang , A. K. Coskun , and M. Herbordt . 2014. An investigation of Unified Memory Access performance in CUDA . In 2014 IEEE High Performance Extreme Computing Conference (HPEC). 1--6. R. Landaverde, Tiansheng Zhang, A. K. Coskun, and M. Herbordt. 2014. An investigation of Unified Memory Access performance in CUDA. In 2014 IEEE High Performance Extreme Computing Conference (HPEC). 1--6."},{"key":"e_1_3_2_1_15_1","volume-title":"An Evaluation of Unified Memory Technology on NVIDIA GPUs. In 2015 15th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing. 1092--1098","author":"Li W.","unstructured":"W. Li , G. Jin , X. Cui , and S. See . 2015 . An Evaluation of Unified Memory Technology on NVIDIA GPUs. In 2015 15th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing. 1092--1098 . W. Li, G. Jin, X. Cui, and S. See. 2015. An Evaluation of Unified Memory Technology on NVIDIA GPUs. In 2015 15th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing. 1092--1098."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3085158.3086160"},{"key":"e_1_3_2_1_17_1","volume-title":"The ASCI UMT Benchmark Code. https:\/\/asc.llnl.gov\/computing_resources\/purple\/archive\/benchmarks\/umt Accessed","author":"LLNL.","year":"2019","unstructured":"LLNL. 2019. The ASCI UMT Benchmark Code. https:\/\/asc.llnl.gov\/computing_resources\/purple\/archive\/benchmarks\/umt Accessed : March 26, 2019 . LLNL. 2019. The ASCI UMT Benchmark Code. https:\/\/asc.llnl.gov\/computing_resources\/purple\/archive\/benchmarks\/umt Accessed: March 26, 2019."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00035"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3148173.3148184"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2015.2463813"},{"key":"e_1_3_2_1_21_1","volume-title":"http:\/\/mvapich.cse.ohio-state.edu\/benchmarks\/ Accessed","author":"Computing Laboratory Network Based","year":"2019","unstructured":"Network Based Computing Laboratory . 2019. OSU Micro-benchmarks. http:\/\/mvapich.cse.ohio-state.edu\/benchmarks\/ Accessed : March 26, 2019 . Network Based Computing Laboratory. 2019. OSU Micro-benchmarks. http:\/\/mvapich.cse.ohio-state.edu\/benchmarks\/ Accessed: March 26, 2019."},{"key":"e_1_3_2_1_22_1","volume-title":"The Ohio State University","author":"Computing Laboratory Network-Based","year":"2001","unstructured":"Network-Based Computing Laboratory , The Ohio State University . 2001 . MVAPICH : MPI over InfiniBand, Omni-Path, Ethernet\/iWARP, and RoCE. http:\/\/mvapich.cse.ohio-state.edu\/ Accessed : March 26, 2019. Network-Based Computing Laboratory, The Ohio State University. 2001. MVAPICH: MPI over InfiniBand, Omni-Path, Ethernet\/iWARP, and RoCE. http:\/\/mvapich.cse.ohio-state.edu\/ Accessed: March 26, 2019."},{"key":"e_1_3_2_1_23_1","volume-title":"https:\/\/developer.nvidia.com\/gpudirect Accessed","author":"Direct NVIDIA.","year":"2019","unstructured":"NVIDIA. 2011. NVIDIA GPU Direct . https:\/\/developer.nvidia.com\/gpudirect Accessed : March 26, 2019 . NVIDIA. 2011. NVIDIA GPUDirect. https:\/\/developer.nvidia.com\/gpudirect Accessed: March 26, 2019."},{"key":"e_1_3_2_1_24_1","volume-title":"Whitepaper: NVIDIA Tesla P100, section 'Unified Memory'. https:\/\/images.nvidia.com\/content\/pdf\/tesla\/whitepaper\/pascal-architecture-whitepaper.pdf Accessed","author":"NVIDIA.","year":"2016","unstructured":"NVIDIA. 2016 . Whitepaper: NVIDIA Tesla P100, section 'Unified Memory'. https:\/\/images.nvidia.com\/content\/pdf\/tesla\/whitepaper\/pascal-architecture-whitepaper.pdf Accessed : March 26, 2019. NVIDIA. 2016. Whitepaper: NVIDIA Tesla P100, section 'Unified Memory'. https:\/\/images.nvidia.com\/content\/pdf\/tesla\/whitepaper\/pascal-architecture-whitepaper.pdf Accessed: March 26, 2019."},{"key":"e_1_3_2_1_25_1","volume-title":"Whitepaper: NVIDIA Tesla V100 GPU ARCHITECTURE, section 'UNIFIED MEMORY AND ADDRESS TRANSLATION SERVICES'","author":"NVIDIA.","year":"2017","unstructured":"NVIDIA. 2017 . Whitepaper: NVIDIA Tesla V100 GPU ARCHITECTURE, section 'UNIFIED MEMORY AND ADDRESS TRANSLATION SERVICES' . http:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf Accessed: March 26, 2019. NVIDIA. 2017. Whitepaper: NVIDIA Tesla V100 GPU ARCHITECTURE, section 'UNIFIED MEMORY AND ADDRESS TRANSLATION SERVICES'. http:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf Accessed: March 26, 2019."},{"key":"e_1_3_2_1_26_1","volume-title":"https:\/\/developer.nvidia.com\/about-cuda Accessed","author":"Developer NVIDIA.","year":"2019","unstructured":"NVIDIA. 2019. About CUDA | NVIDIA Developer . https:\/\/developer.nvidia.com\/about-cuda Accessed : March 26, 2019 . NVIDIA. 2019. About CUDA | NVIDIA Developer. https:\/\/developer.nvidia.com\/about-cuda Accessed: March 26, 2019."},{"key":"e_1_3_2_1_27_1","volume-title":"Unified Memory in CUDA 6. https:\/\/devblogs.nvidia.com\/unified-memory-in-cuda-6\/ Accessed","author":"NVIDIA.","year":"2019","unstructured":"NVIDIA. 2019. Unified Memory in CUDA 6. https:\/\/devblogs.nvidia.com\/unified-memory-in-cuda-6\/ Accessed : March 26, 2019 . NVIDIA. 2019. Unified Memory in CUDA 6. https:\/\/devblogs.nvidia.com\/unified-memory-in-cuda-6\/ Accessed: March 26, 2019."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2013.17"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2013.17"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2012.228"},{"key":"e_1_3_2_1_31_1","volume-title":"2017 4th International Conference on Control, Decision and Information Technologies (CoDIT). 0539--0543","author":"Rizvi S. T. H.","unstructured":"S. T. H. Rizvi , G. Cabodi , and G. Francini . 2017. GPU-only unified ConvMM layer for neural classifiers . In 2017 4th International Conference on Control, Decision and Information Technologies (CoDIT). 0539--0543 . S. T. H. Rizvi, G. Cabodi, and G. Francini. 2017. GPU-only unified ConvMM layer for neural classifiers. In 2017 4th International Conference on Control, Decision and Information Technologies (CoDIT). 0539--0543."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/3014904.3015015"},{"key":"e_1_3_2_1_33_1","volume-title":"Everything You Need to Know About Unified Memory. (March","author":"Sakharnykh Nikolay","year":"2018","unstructured":"Nikolay Sakharnykh . 2018. Everything You Need to Know About Unified Memory. (March 2018 ). http:\/\/on-demand.gputechconf.com\/gtc\/2018\/presentation\/s8430-everything-you-need-to-know-about-unified-memory.pdf Nikolay Sakharnykh. 2018. Everything You Need to Know About Unified Memory. (March 2018). http:\/\/on-demand.gputechconf.com\/gtc\/2018\/presentation\/s8430-everything-you-need-to-know-about-unified-memory.pdf"},{"key":"e_1_3_2_1_34_1","volume-title":"Danilo Medeiros Eler, and Rog\u00e9rio Eduardo Garcia","author":"Santos Rafael Silva","year":"2016","unstructured":"Rafael Silva Santos , Danilo Medeiros Eler, and Rog\u00e9rio Eduardo Garcia . 2016 . Performance Evaluation of Data Migration Methods Between the Host and the Device in CUDA-Based Programming. In Information Technology: New Generations, Shahram Latifi (Ed.). Springer International Publishing , Cham, 689--700. Rafael Silva Santos, Danilo Medeiros Eler, and Rog\u00e9rio Eduardo Garcia. 2016. Performance Evaluation of Data Migration Methods Between the Host and the Device in CUDA-Based Programming. In Information Technology: New Generations, Shahram Latifi (Ed.). Springer International Publishing, Cham, 689--700."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.222"},{"key":"e_1_3_2_1_36_1","unstructured":"xBus. 2019. xBus - Overview. http:\/\/xbus.sourceforge.net Accessed: March 26 2019.  xBus. 2019. xBus - Overview. http:\/\/xbus.sourceforge.net Accessed: March 26 2019."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2996190"}],"event":{"name":"ASPLOS '19: Architectural Support for Programming Languages and Operating Systems","location":"Providence RI USA","acronym":"ASPLOS '19","sponsor":["SIGPLAN ACM Special Interest Group on Programming Languages","SIGOPS ACM Special Interest Group on Operating Systems","SIGARCH ACM Special Interest Group on Computer Architecture","SIGBED ACM Special Interest Group on Embedded Systems"]},"container-title":["Proceedings of the 12th Workshop on General Purpose Processing Using GPUs"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3300053.3319419","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3300053.3319419","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:23:51Z","timestamp":1750202631000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3300053.3319419"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,4,13]]},"references-count":37,"alternative-id":["10.1145\/3300053.3319419","10.1145\/3300053"],"URL":"https:\/\/doi.org\/10.1145\/3300053.3319419","relation":{},"subject":[],"published":{"date-parts":[[2019,4,13]]},"assertion":[{"value":"2019-04-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}