{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:49:25Z","timestamp":1750308565508,"version":"3.41.0"},"reference-count":31,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2014,12,8]],"date-time":"2014-12-08T00:00:00Z","timestamp":1417996800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100006245","name":"Ministry of Science and Technology, Israel","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006245","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2015,1,9]]},"abstract":"<jats:p>GPUs play an increasingly important role in high-performance computing. While developing naive code is straightforward, optimizing massively parallel applications requires deep understanding of the underlying architecture. The developer must struggle with complex index calculations and manual memory transfers. This article classifies memory access patterns used in most parallel algorithms, based on Berkeley\u2019s Parallel \u201cDwarfs.\u201d It then proposes the MAPS framework, a device-level memory abstraction that facilitates memory access on GPUs, alleviating complex indexing using on-device containers and iterators. This article presents an implementation of MAPS and shows that its performance is comparable to carefully optimized implementations of real-world applications.<\/jats:p>","DOI":"10.1145\/2680544","type":"journal-article","created":{"date-parts":[[2014,12,8]],"date-time":"2014-12-08T16:17:14Z","timestamp":1418055434000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["MAPS"],"prefix":"10.1145","volume":"11","author":[{"given":"Eri","family":"Rubin","sequence":"first","affiliation":[{"name":"The Hebrew University of Jerusalem, Jerusalem, Israel"}]},{"given":"Ely","family":"Levy","sequence":"additional","affiliation":[{"name":"The Hebrew University of Jerusalem, Jerusalem, Israel"}]},{"given":"Amnon","family":"Barak","sequence":"additional","affiliation":[{"name":"The Hebrew University of Jerusalem, Jerusalem, Israel"}]},{"given":"Tal","family":"Ben-Nun","sequence":"additional","affiliation":[{"name":"The Hebrew University of Jerusalem, Jerusalem, Israel"}]}],"member":"320","published-online":{"date-parts":[[2014,12,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1572769.1572792"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/1769331.1769344"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063400"},{"key":"e_1_2_1_5_1","unstructured":"Boost. 2014. Boost C&plus;&plus; Libraries. Retrieved from http:\/\/www.boost.org\/.  Boost. 2014. Boost C&plus;&plus; Libraries. Retrieved from http:\/\/www.boost.org\/."},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Martin Burtscher and Keshav Pingali. 2011. An efficient CUDA implementation of the tree-based barnes hut n-body algorithm. In GPU Computing Gems Emerald Edition. Morgan Kaufmann 75--92.  Martin Burtscher and Keshav Pingali. 2011. An efficient CUDA implementation of the tree-based barnes hut n-body algorithm. In GPU Computing Gems Emerald Edition. Morgan Kaufmann 75--92.","DOI":"10.1016\/B978-0-12-384988-5.00006-1"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063401"},{"key":"e_1_2_1_8_1","unstructured":"CUB. 2014. CUB GPU Computing Primitives Library NVIDIA Research. Retrieved from http:\/\/nvlabs.github.io\/cub\/.  CUB. 2014. CUB GPU Computing Primitives Library NVIDIA Research. Retrieved from http:\/\/nvlabs.github.io\/cub\/."},{"key":"e_1_2_1_9_1","unstructured":"CUBLAS. 2014. CUBLAS Library Documentation. Retrieved from http:\/\/docs.nvidia.com\/cuda\/cublas\/.  CUBLAS. 2014. CUBLAS Library Documentation. Retrieved from http:\/\/docs.nvidia.com\/cuda\/cublas\/."},{"key":"e_1_2_1_10_1","unstructured":"CUDA. 2014. NVIDIA CUDA SDK. (2014). http:\/\/www.nvidia.com\/cuda.  CUDA. 2014. NVIDIA CUDA SDK. (2014). http:\/\/www.nvidia.com\/cuda."},{"key":"e_1_2_1_11_1","unstructured":"CUFFT. 2014. CUFFT Library Documentation. Retrieved from http:\/\/docs.nvidia.com\/cuda\/cufft\/.  CUFFT. 2014. CUFFT Library Documentation. Retrieved from http:\/\/docs.nvidia.com\/cuda\/cufft\/."},{"volume-title":"Proceedings of the 43rd International Conference on Parallel Processing (ICPP\u201914)","author":"Fang Jianbin","key":"e_1_2_1_12_1"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1391469.1391473"},{"key":"e_1_2_1_14_1","unstructured":"Kate Gregory and Ade Miller. 2012. C&plus;&plus; AMP: Accelerated Massive Parallelism with Microsoft\u00ae Visual C&plus;&plus;\u00ae. Microsoft Press.  Kate Gregory and Ade Miller. 2012. C&plus;&plus; AMP: Accelerated Massive Parallelism with Microsoft\u00ae Visual C&plus;&plus;\u00ae. Microsoft Press."},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Kshitij Gupta Jeff A. Stuart and John D. Owens. 2012. A study of persistent threads style GPU programming for GPGPU workloads. In Innovative Parallel Computing.  Kshitij Gupta Jeff A. Stuart and John D. Owens. 2012. A study of persistent threads style GPU programming for GPGPU workloads. In Innovative Parallel Computing.","DOI":"10.1109\/InPar.2012.6339596"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2010.62"},{"volume-title":"CUDPP: CUDA Data Parallel Primitives Library.","year":"2007","author":"Harris Mark","key":"e_1_2_1_17_1"},{"volume-title":"Thrust: A Parallel Template Library.","year":"2010","author":"Hoberock Jared","key":"e_1_2_1_18_1"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1941553.1941590"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/311535.311565"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.36"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1122501.1122505"},{"key":"e_1_2_1_23_1","unstructured":"MAPS. 2014. MAPS Code Repository. Retrieved from https:\/\/github.com\/erdooom\/MAPS.  MAPS. 2014. MAPS Code Repository. Retrieved from https:\/\/github.com\/erdooom\/MAPS."},{"key":"e_1_2_1_24_1","unstructured":"David R. Musser Gilmer J. Derge and Atul Saini. 2001. STL Tutorial and Reference Guide Second Edition: C&plus;&plus; Programming with the Standard Template Library. Addison-Wesley Longman Boston MA.   David R. Musser Gilmer J. Derge and Atul Saini. 2001. STL Tutorial and Reference Guide Second Edition: C&plus;&plus; Programming with the Standard Template Library. Addison-Wesley Longman Boston MA."},{"key":"e_1_2_1_25_1","unstructured":"NPP. 2014. NVIDIA Performance Primitives (NPP) Library. Retrieved from http:\/\/developer.nvidia.com\/npp\/.  NPP. 2014. NVIDIA Performance Primitives (NPP) Library. Retrieved from http:\/\/developer.nvidia.com\/npp\/."},{"volume-title":"GPU Gems 3","author":"Nyland Lars","key":"e_1_2_1_26_1"},{"key":"e_1_2_1_27_1","unstructured":"OpenACC. 2012. OpenACC\u2014Directives for Accelerators. Retrieved from http:\/\/www.openacc-standard.org.  OpenACC. 2012. OpenACC\u2014Directives for Accelerators. Retrieved from http:\/\/www.openacc-standard.org."},{"key":"e_1_2_1_28_1","unstructured":"Xavier Provot. 1995. Deformation constraints in a mass-spring model to describe rigid cloth behavior. In Graphics Interface. 147--154.  Xavier Provot. 1995. Deformation constraints in a mass-spring model to describe rigid cloth behavior. In Graphics Interface. 147--154."},{"key":"e_1_2_1_29_1","unstructured":"Greg Ruetsch and Paulius Micikevicius. 2009. Optimizing Matrix Transpose in CUDA. Retrieved from https:\/\/users.csc.calpoly.edu\/clupo\/teaching\/419\/winter14\/MatrixTranspose.pdf.  Greg Ruetsch and Paulius Micikevicius. 2009. Optimizing Matrix Transpose in CUDA. Retrieved from https:\/\/users.csc.calpoly.edu\/clupo\/teaching\/419\/winter14\/MatrixTranspose.pdf."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1275808.1276489"},{"key":"e_1_2_1_31_1","unstructured":"Bjarne Stroustrup. 2000. The C&plus;&plus; Programming Language (3rd ed.). Addison-Wesley Longman Boston MA.   Bjarne Stroustrup. 2000. The C&plus;&plus; Programming Language (3rd ed.). Addison-Wesley Longman Boston MA."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1810085.1810104"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2680544","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2680544","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T19:04:16Z","timestamp":1750273456000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2680544"}},"subtitle":["Optimizing Massively Parallel Applications Using Device-Level Memory Abstraction"],"short-title":[],"issued":{"date-parts":[[2014,12,8]]},"references-count":31,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,1,9]]}},"alternative-id":["10.1145\/2680544"],"URL":"https:\/\/doi.org\/10.1145\/2680544","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2014,12,8]]},"assertion":[{"value":"2014-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-12-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}