{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T20:46:08Z","timestamp":1764103568261,"version":"3.41.0"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T00:00:00Z","timestamp":1739232000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["1R01GM140316"],"award-info":[{"award-number":["1R01GM140316"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CNS-1513126"],"award-info":[{"award-number":["CNS-1513126"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Jiangxi Provincial Key R&D Program, China","award":["012031379055"],"award-info":[{"award-number":["012031379055"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Parallel Comput."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today\u2019s data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of paramount importance. One such challenge is the design of a dynamic memory allocator to allocate memory at runtime. Traditionally, memory allocators have relied on maintaining a global data structure, such as a queue of free pages. However, in the context of massively parallel systems, accessing such global data structures can quickly become a bottleneck even with multiple queues in place. This paper presents a novel approach to dynamic memory allocation that eliminates the need for a centralized data structure. Our proposed approach revolves around letting threads employ random search procedures to locate free pages. Through mathematical proofs and extensive experiments, we demonstrate that the basic random search design achieves lower latency than the best-known existing solution, Ouroboros, in most situations. Furthermore, we develop more advanced techniques and algorithms to tackle the challenge of warp divergence and further enhance performance when free memory is limited. Building upon these advancements, our mathematical proofs and experimental results affirm that these advanced designs can yield an order of magnitude improvement over the basic design and consistently outperform the state-of-the-art by up to two orders of magnitude. To illustrate the practical implications of our work, we integrate our memory management techniques into two GPU algorithms: a hash join and a group-by. Both case studies provide compelling evidence of our approach\u2019s pronounced performance gains.<\/jats:p>","DOI":"10.1145\/3701623","type":"journal-article","created":{"date-parts":[[2024,11,11]],"date-time":"2024-11-11T10:56:47Z","timestamp":1731322607000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Dynamic Buffer Management in Massively Parallel Systems: The Power of Randomness"],"prefix":"10.1145","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3981-0951","authenticated-orcid":false,"given":"Minh","family":"Pham","sequence":"first","affiliation":[{"name":"Computer Science &amp; Engineering, University of South Florida, Tampa, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9686-6049","authenticated-orcid":false,"given":"Yongke","family":"Yuan","sequence":"additional","affiliation":[{"name":"Beijing University of Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8425-4459","authenticated-orcid":false,"given":"Hao","family":"Li","sequence":"additional","affiliation":[{"name":"University of South Florida, Tampa, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2651-1696","authenticated-orcid":false,"given":"Chengcheng","family":"Mou","sequence":"additional","affiliation":[{"name":"University of South Florida, Tampa, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4062-2694","authenticated-orcid":false,"given":"Yicheng","family":"Tu","sequence":"additional","affiliation":[{"name":"University of South Florida, Tampa, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9293-8028","authenticated-orcid":false,"given":"Zichen","family":"Xu","sequence":"additional","affiliation":[{"name":"Nanchang University, Nanchang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-9270-8370","authenticated-orcid":false,"given":"Jinghan","family":"Meng","sequence":"additional","affiliation":[{"name":"University of South Florida, Tampa, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,2,11]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"16,806,886","volume-title":"Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables","author":"Abramowitz Milton","unstructured":"Milton Abramowitz and Irene A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. New York: Dover Publications, 16,806,886."},{"key":"e_1_3_2_3_2","volume-title":"GPU Technology Conference (GTC\u201914)","volume":"152","author":"Adinetz Andrew V.","year":"2014","unstructured":"Andrew V. Adinetz and Dirk Pleiter. 2014. Halloc: A high-throughput dynamic memory allocator for GPGPU architectures. In GPU Technology Conference (GTC\u201914), Vol. 152."},{"key":"e_1_3_2_4_2","doi-asserted-by":"crossref","unstructured":"Yehia Arafa Abdel-Hameed Badawy Gopinath Chennupati Nandakishore Santhi and Stephan Eidenbenz. 2019. Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs. (2019). arxiv:cs.DC\/1905.08778","DOI":"10.1109\/HPEC.2019.8916466"},{"key":"e_1_3_2_5_2","first-page":"128","volume-title":"International Conference: Beyond Databases, Architectures and Structures","author":"Arefyeva Iya","year":"2018","unstructured":"Iya Arefyeva, David Broneske, Gabriel Campero, Marcus Pinnecke, and Gunter Saake. 2018. Memory management strategies in CPU\/GPU database systems: A survey. In International Conference: Beyond Databases, Architectures and Structures. Springer, 128\u2013142."},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3108139"},{"key":"e_1_3_2_7_2","volume-title":"IMPACT 2020, in conjunction with HiPEAC 2020","author":"Baroudi Toufik","year":"2020","unstructured":"Toufik Baroudi, Vincent Loechner, and Rachid Seghir. 2020. Static versus dynamic memory allocation: A comparison for linear algebra kernels. In IMPACT 2020, in conjunction with HiPEAC 2020."},{"issue":"11","key":"e_1_3_2_8_2","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1145\/356989.357000","article-title":"Hoard: A scalable memory allocator for multithreaded applications","volume":"35","author":"Berger Emery D.","year":"2000","unstructured":"Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: A scalable memory allocator for multithreaded applications. ACM SIGPLAN Notices 35, 11 (2000), 117\u2013128.","journal-title":"ACM SIGPLAN Notices"},{"issue":"2","key":"e_1_3_2_9_2","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1145\/235968.233351","article-title":"Goal-oriented buffer management revisited","volume":"25","author":"Brown Kurt P.","year":"1996","unstructured":"Kurt P. Brown, Michael J. Carey, and Miron Livny. 1996. Goal-oriented buffer management revisited. ACM SIGMOD Record 25, 2 (1996), 353\u2013364.","journal-title":"ACM SIGMOD Record"},{"key":"e_1_3_2_10_2","first-page":"604","article-title":"Approximants of the Euler\u2013Mascheroni constant and harmonic numbers","volume":"222","author":"Buri\u0107 Tomislav","year":"2013","unstructured":"Tomislav Buri\u0107 and Neven Elezovi\u0107. 2013. Approximants of the Euler\u2013Mascheroni constant and harmonic numbers. Appl. Math. Comput. 222 (2013), 604\u2013611.","journal-title":"Appl. Math. Comput."},{"key":"e_1_3_2_11_2","first-page":"1","volume-title":"2018 IEEE High Performance extreme Computing Conference (HPEC\u201918)","author":"Busato Federico","year":"2018","unstructured":"Federico Busato, Oded Green, Nicola Bombieri, and David A. Bader. 2018. Hornet: An efficient data structure for dynamic sparse graphs and matrices on GPUs. In 2018 IEEE High Performance extreme Computing Conference (HPEC\u201918). IEEE, 1\u20137."},{"key":"e_1_3_2_12_2","volume-title":"Adaptive Database Buffer Allocation using Query Feedback","author":"Chen ChungMin Melvin","year":"1998","unstructured":"ChungMin Melvin Chen and Nick Roussopoulos. 1998. Adaptive Database Buffer Allocation using Query Feedback. Technical Report."},{"issue":"1","key":"e_1_3_2_13_2","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1007\/BF01840450","article-title":"An evaluation of buffer management strategies for relational database systems","volume":"1","author":"Chou Hong-Tai","year":"1986","unstructured":"Hong-Tai Chou and David J. DeWitt. 1986. An evaluation of buffer management strategies for relational database systems. Algorithmica 1, 1-4 (1986), 311\u2013336.","journal-title":"Algorithmica"},{"key":"e_1_3_2_14_2","first-page":"185","volume-title":"Proceedings of the November 30\u2013December 1, 1965, Fall Joint Computer Conference, Part I","author":"Corbat\u00f3 Fernando J.","year":"1965","unstructured":"Fernando J. Corbat\u00f3 and Victor A. Vyssotsky. 1965. Introduction and overview of the Multics system. In Proceedings of the November 30\u2013December 1, 1965, Fall Joint Computer Conference, Part I. 185\u2013196."},{"issue":"4","key":"e_1_3_2_15_2","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1145\/321296.321310","article-title":"Segmentation and the design of multiprogrammed computer systems","volume":"12","author":"Dennis Jack B.","year":"1965","unstructured":"Jack B. Dennis. 1965. Segmentation and the design of multiprogrammed computer systems. Journal of the ACM (JACM) 12, 4 (1965), 589\u2013602.","journal-title":"Journal of the ACM (JACM)"},{"key":"e_1_3_2_16_2","unstructured":"Niall Douglas. 2006. ned Productions - Nedmalloc. https:\/\/www.nedprod.com\/programs\/portable\/nedmalloc (2006). Retrieved February 01 2024."},{"issue":"4","key":"e_1_3_2_17_2","doi-asserted-by":"crossref","first-page":"560","DOI":"10.1145\/1994.2022","article-title":"Principles of database buffer management","volume":"9","author":"Effelsberg Wolfgang","year":"1984","unstructured":"Wolfgang Effelsberg and Theo Haerder. 1984. Principles of database buffer management. ACM Transactions on Database Systems (TODS) 9, 4 (1984), 560\u2013595.","journal-title":"ACM Transactions on Database Systems (TODS)"},{"key":"e_1_3_2_18_2","unstructured":"J. Evans. 2006. jemalloc: A Scalable Concurrent Malloc(3) Implementation. https:\/\/github.com\/jemalloc\/jemalloc (2006). Retrieved February 01 2024."},{"key":"e_1_3_2_19_2","first-page":"393","volume-title":"2016 USENIX Annual Technical Conference (USENIX\u201916)","author":"Falsafi Babak","year":"2016","unstructured":"Babak Falsafi, Rachid Guerraoui, Javier Picorel, and Vasileios Trigonakis. 2016. Unlocking energy. In 2016 USENIX Annual Technical Conference (USENIX\u201916). 393\u2013406."},{"key":"e_1_3_2_20_2","unstructured":"W. Gloger. 2006. Ptmalloc3 - a Multi-thread Malloc Implementation. https:\/\/github.com\/Cloudifold\/ptmalloc (2006). Retrieved February 01 2024."},{"key":"e_1_3_2_21_2","unstructured":"Google. 2024. TCMalloc. https:\/\/github.com\/google\/tcmalloc (2024). Retrieved February 01 2024."},{"issue":"4","key":"e_1_3_2_22_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1620585.1620588","article-title":"Relational query coprocessing on graphics processors","volume":"34","author":"He Bingsheng","year":"2009","unstructured":"Bingsheng He, Mian Lu, Ke Yang, Rui Fang, Naga K. Govindaraju, Qiong Luo, and Pedro V, Sander. 2009. Relational query coprocessing on graphics processors. ACM Transactions on Database Systems (TODS) 34, 4 (2009), 1\u201339.","journal-title":"ACM Transactions on Database Systems (TODS)"},{"key":"e_1_3_2_23_2","first-page":"511","volume-title":"Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD\u201908)","author":"He Bingsheng","year":"2008","unstructured":"Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga Govindaraju, Qiong Luo, and Pedro Sander. 2008. Relational joins on graphics processors. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD\u201908). ACM, New York, NY, USA, 511\u2013524. 10.1145\/1376616.1376670"},{"key":"e_1_3_2_24_2","doi-asserted-by":"crossref","first-page":"1134","DOI":"10.1109\/CIT.2010.206","volume-title":"2010 10th IEEE International Conference on Computer and Information Technology","author":"Huang Xiaohuang","year":"2010","unstructured":"Xiaohuang Huang, Christopher I. Rodrigues, Stephen Jones, Ian Buck, and Wen-mei Hwu. 2010. Xmalloc: A scalable lock-free dynamic memory allocator for many-core machines. In 2010 10th IEEE International Conference on Computer and Information Technology. IEEE, 1134\u20131139."},{"key":"e_1_3_2_25_2","volume-title":"15th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201921)","author":"Hunter Andrew Hamilton","year":"2021","unstructured":"Andrew Hamilton Hunter, Chris Kennelly, Darryl Gove, Parthasarathy Ranganathan, Paul Jack Turner, and Tipp James Moseley. 2021. Beyond malloc efficiency to fleet efficiency: A hugepage-aware memory allocator. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201921)."},{"key":"e_1_3_2_26_2","volume-title":"A First Course in Stochastic Processes","author":"Karlin Samuel","year":"2014","unstructured":"Samuel Karlin. 2014. A First Course in Stochastic Processes. Academic Press."},{"issue":"2","key":"e_1_3_2_27_2","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1109\/TEC.1962.5219356","article-title":"One-level storage system","author":"Kilburn Tom","year":"1962","unstructured":"Tom Kilburn, David B. G. Edwards, Michael J. Lanigan, and Frank H. Sumner. 1962. One-level storage system. IRE Transactions on Electronic Computers2 (1962), 223\u2013235.","journal-title":"IRE Transactions on Electronic Computers"},{"issue":"203","key":"e_1_3_2_28_2","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1090\/S0025-5718-1993-1197512-7","article-title":"Johann Faulhaber and sums of powers","volume":"61","author":"Knuth Donald E.","year":"1993","unstructured":"Donald E. Knuth. 1993. Johann Faulhaber and sums of powers. Math. Comp. 61, 203 (1993), 277\u2013294.","journal-title":"Math. Comp."},{"key":"e_1_3_2_29_2","unstructured":"Doug Lea. 1996. A Memory Allocator. http:\/\/gee.cs.oswego.edu\/dl\/html\/malloc.html (1996). Accessed: 2024-02-01."},{"key":"e_1_3_2_30_2","first-page":"301","volume-title":"Big Data (Big Data), 2014 IEEE International Conference on","author":"Li H.","year":"2014","unstructured":"H. Li, D. Yu, A. Kumar, and Y. Tu. 2014. Performance modeling in CUDA streams - A means for high-throughput data processing. In Big Data (Big Data), 2014 IEEE International Conference on. 301\u2013310."},{"key":"e_1_3_2_31_2","first-page":"327","volume-title":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","author":"Liu Tongping","year":"2011","unstructured":"Tongping Liu, Charlie Curtsinger, and Emery D. Berger. 2011. Dthreads: Efficient deterministic multithreading. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. 327\u2013336."},{"key":"e_1_3_2_32_2","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1145\/996841.996848","volume-title":"Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI\u201904)","author":"Michael Maged M.","year":"2004","unstructured":"Maged M. Michael. 2004. Scalable lock-free dynamic memory allocation. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI\u201904). Association for Computing Machinery, New York, NY, USA, 35\u201346. 10.1145\/996841.996848"},{"key":"e_1_3_2_33_2","doi-asserted-by":"crossref","first-page":"1935","DOI":"10.1145\/2882903.2915224","volume-title":"Proceedings of the 2016 International Conference on Management of Data","author":"Paul Johns","year":"2016","unstructured":"Johns Paul, Jiong He, and Bingsheng He. 2016. GPL: A GPU-based pipelined query processing engine. In Proceedings of the 2016 International Conference on Management of Data. ACM, 1935\u20131950."},{"key":"e_1_3_2_34_2","volume-title":"Proceedings of the 36th ACM International Conference on Supercomputing (ICS\u201922)","author":"Pham Minh","year":"2022","unstructured":"Minh Pham, Hao Li, Yongke Yuan, Chengcheng Mou, Kandethody Ramachandran, Zichen Xu, and Yicheng Tu. 2022. Dynamic memory management in massively parallel systems: A case on GPUs. In Proceedings of the 36th ACM International Conference on Supercomputing (ICS\u201922). Association for Computing Machinery, New York, NY, USA, Article 24, 13 pages. 10.1145\/3524059.3532387"},{"key":"e_1_3_2_35_2","volume-title":"Proofs That Really Count: The Art of Combinatorial Proof","author":"Quinn Jennifer J.","year":"2003","unstructured":"Jennifer J. Quinn and Arthur T. Benjamin. 2003. Proofs That Really Count: The Art of Combinatorial Proof. The Mathematical Association of America."},{"issue":"4","key":"e_1_3_2_36_2","first-page":"708","article-title":"Efficient join algorithms for large database tables in a multi-GPU environment","volume":"14","author":"Rui Ran","year":"2021","unstructured":"Ran Rui, Hao Li, and Yi-Cheng Tu. 2021. Efficient join algorithms for large database tables in a multi-GPU environment. Proceedings of the VLDB Endowment 14, 4 (2021), 708\u2013720.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_3_2_37_2","volume-title":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management (SSDBM\u201917)","author":"Rui Ran","year":"2017","unstructured":"Ran Rui and Yi-Cheng Tu. 2017. Fast equi-join algorithms on GPUs: Design and implementation. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management (SSDBM\u201917). ACM, New York, NY, USA, Article 17, 12 pages. 10.1145\/3085504.3085521"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","unstructured":"Scott Schneider Christos D. Antonopoulos and Dimitrios S. Nikolopoulos. 2006. Scalable locality-conscious multithreaded memory allocation(ISMM\u201906). Association for Computing Machinery New York NY USA 84\u201394. 10.1145\/1133956.1133968","DOI":"10.1145\/1133956.1133968"},{"key":"e_1_3_2_39_2","article-title":"DynaSOAr: A parallel memory allocator for object-oriented programming on GPUs with efficient memory access","author":"Springer Matthias","year":"2018","unstructured":"Matthias Springer and Hidehiko Masuhara. 2018. DynaSOAr: A parallel memory allocator for object-oriented programming on GPUs with efficient memory access. arXiv preprint arXiv:1810.11765 (2018).","journal-title":"arXiv preprint arXiv:1810.11765"},{"key":"e_1_3_2_40_2","first-page":"1","volume-title":"2012 Innovative Parallel Computing (InPar\u201912)","author":"Steinberger Markus","year":"2012","unstructured":"Markus Steinberger, Michael Kenzel, Bernhard Kainz, and Dieter Schmalstieg. 2012. ScatterAlloc: Massively parallel dynamic memory allocation for the GPU. In 2012 Innovative Parallel Computing (InPar\u201912). IEEE, 1\u201310."},{"issue":"7","key":"e_1_3_2_41_2","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1145\/358699.358703","article-title":"Operating system support for database management","volume":"24","author":"Stonebraker Michael","year":"1981","unstructured":"Michael Stonebraker. 1981. Operating system support for database management. Commun. ACM 24, 7 (1981), 412\u2013418.","journal-title":"Commun. ACM"},{"key":"e_1_3_2_42_2","unstructured":"RD Team et\u00a0al. RAPIDS: Collection of Libraries for End to End GPU Data Science 2018."},{"key":"e_1_3_2_43_2","first-page":"143","volume-title":"Computer Graphics Forum","author":"Vinkler Marek","year":"2015","unstructured":"Marek Vinkler and Vlastimil Havran. 2015. Register efficient dynamic memory allocator for GPUs. In Computer Graphics Forum, Vol. 34. Wiley Online Library, 143\u2013154."},{"key":"e_1_3_2_44_2","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1145\/2458523.2458535","volume-title":"Proceedings of the 6th Workshop on General Purpose Processor using Graphics Processing Units","author":"Widmer Sven","year":"2013","unstructured":"Sven Widmer, Dominik Wodniok, Nicolas Weber, and Michael Goesele. 2013. Fast dynamic memory allocator for massively parallel architectures. In Proceedings of the 6th Workshop on General Purpose Processor using Graphics Processing Units. 120\u2013126."},{"key":"e_1_3_2_45_2","first-page":"1","volume-title":"Proceedings of the 34th ACM International Conference on Supercomputing","author":"Winter Martin","year":"2020","unstructured":"Martin Winter, Daniel Mlakar, Mathias Parger, and Markus Steinberger. 2020. Ouroboros: Virtualized queues for dynamic memory management on GPUs. In Proceedings of the 34th ACM International Conference on Supercomputing. 1\u201312."},{"key":"e_1_3_2_46_2","doi-asserted-by":"crossref","first-page":"754","DOI":"10.1109\/SC.2018.00063","volume-title":"SC18: International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Winter Martin","year":"2018","unstructured":"Martin Winter, Daniel Mlakar, Rhaleb Zayer, Hans-Peter Seidel, and Markus Steinberger. 2018. faimGraph: High performance management of fully-dynamic graphs under tight memory constraints on the GPU. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 754\u2013766."},{"key":"e_1_3_2_47_2","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1145\/3437801.3441612","volume-title":"Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","author":"Winter Martin","year":"2021","unstructured":"Martin Winter, Mathias Parger, Daniel Mlakar, and Markus Steinberger. 2021. Are dynamic memory managers on GPUs slow? A survey and benchmarks. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 219\u2013233."},{"issue":"10","key":"e_1_3_2_48_2","doi-asserted-by":"crossref","first-page":"817","DOI":"10.14778\/2536206.2536210","article-title":"The Yin and Yang of processing data warehousing queries on GPU devices","volume":"6","author":"Yuan Yuan","year":"2013","unstructured":"Yuan Yuan, Rubao Lee, and Xiaodong Zhang. 2013. The Yin and Yang of processing data warehousing queries on GPU devices. Proceedings of the VLDB Endowment 6, 10 (2013), 817\u2013828.","journal-title":"Proceedings of the VLDB Endowment"},{"issue":"4","key":"e_1_3_2_49_2","doi-asserted-by":"crossref","first-page":"399","DOI":"10.1037\/1082-989X.12.4.399","article-title":"Toward using confidence intervals to compare correlations.","volume":"12","author":"Zou Guang Yong","year":"2007","unstructured":"Guang Yong Zou. 2007. Toward using confidence intervals to compare correlations. Psychological Methods 12, 4 (2007), 399.","journal-title":"Psychological Methods"}],"container-title":["ACM Transactions on Parallel Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3701623","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3701623","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:10:05Z","timestamp":1750295405000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3701623"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,11]]},"references-count":48,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3701623"],"URL":"https:\/\/doi.org\/10.1145\/3701623","relation":{},"ISSN":["2329-4949","2329-4957"],"issn-type":[{"type":"print","value":"2329-4949"},{"type":"electronic","value":"2329-4957"}],"subject":[],"published":{"date-parts":[[2025,2,11]]},"assertion":[{"value":"2023-08-21","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-13","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}