{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:29:05Z","timestamp":1750220945510,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":36,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,2,17]],"date-time":"2019-02-17T00:00:00Z","timestamp":1550361600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000181","name":"Air Force Office of Scientific Research","doi-asserted-by":"publisher","award":["FA9550-17-1-0367"],"award-info":[{"award-number":["FA9550-17-1-0367"]}],"id":[{"id":"10.13039\/100000181","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,2,17]]},"DOI":"10.1145\/3303084.3309488","type":"proceedings-article","created":{"date-parts":[[2019,2,19]],"date-time":"2019-02-19T20:59:18Z","timestamp":1550609958000},"page":"11-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Don't Forget About Synchronization!"],"prefix":"10.1145","author":[{"given":"Jacob","family":"Nelson","sequence":"first","affiliation":[{"name":"Computer Science and Engineering, Lehigh University, USA"}]},{"given":"Roberto","family":"Palmieri","sequence":"additional","affiliation":[{"name":"Computer Science and Engineering, Lehigh University, USA"}]}],"member":"320","published-online":{"date-parts":[[2019,2,17]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2015. GPU Pro Tip: Fast Histograms Using Shared Atomics on Maxwell. https:\/\/devblogs.nvidia.com\/gpu-pro-tip-fast-histograms-using-shared-atomics-maxwell\/  2015. GPU Pro Tip: Fast Histograms Using Shared Atomics on Maxwell. https:\/\/devblogs.nvidia.com\/gpu-pro-tip-fast-histograms-using-shared-atomics-maxwell\/"},{"key":"e_1_3_2_1_2_1","unstructured":"2017. Try to use lock and unlock in CUDA. https:\/\/devtalk.nvidia.com\/default\/topic\/1014009\/try-to-use-lock-and-unlock-in-cuda\/  2017. Try to use lock and unlock in CUDA. https:\/\/devtalk.nvidia.com\/default\/topic\/1014009\/try-to-use-lock-and-unlock-in-cuda\/"},{"key":"e_1_3_2_1_3_1","unstructured":"2018. KMCUDA. https:\/\/github.com\/src-d\/kmcuda  2018. KMCUDA. https:\/\/github.com\/src-d\/kmcuda"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2775054.2694391"},{"key":"e_1_3_2_1_5_1","unstructured":"AMD. 2016. ROCm a New Era in Open GPU Computing. https:\/\/rocm.github.io  AMD. 2016. ROCm a New Era in Open GPU Computing. https:\/\/rocm.github.io"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1735688.1735706"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2741948.2741962"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1629575.1629579"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2012.16"},{"key":"e_1_3_2_1_10_1","unstructured":"Bryan Catanzaro and Levi Barnes. 2015. NVIDIA K-means. https:\/\/github.com\/NVIDIA\/kmeans  Bryan Catanzaro and Levi Barnes. 2015. NVIDIA K-means. https:\/\/github.com\/NVIDIA\/kmeans"},{"key":"e_1_3_2_1_11_1","unstructured":"Daniel Cederman Philippas Tsigas and Muhammad Tayyab Chaudhry. {n. d.}. Towards a Software Transactional Memory for Graphics Processors.  Daniel Cederman Philippas Tsigas and Muhammad Tayyab Chaudhry. {n. d.}. Towards a Software Transactional Memory for Graphics Processors."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080204"},{"volume-title":"Neural networks: Tricks of the trade","author":"Coates Adam","key":"e_1_3_2_1_13_1"},{"key":"e_1_3_2_1_14_1","unstructured":"Gabriella Csurka Christopher Dance Lixin Fan Jutta Willamowski and C\u00e9dric Bray. {n. d.}. Visual categorization with bags of keypoints.  Gabriella Csurka Christopher Dance Lixin Fan Jutta Willamowski and C\u00e9dric Bray. {n. d.}. Visual categorization with bags of keypoints."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Inderjit S Dhillon and Dharmendra S Modha. 2001. Concept decompositions for large sparse text data using clustering. Machine learning 42 1--2 (2001) 143--175.  Inderjit S Dhillon and Dharmendra S Modha. 2001. Concept decompositions for large sparse text data using clustering. Machine learning 42 1--2 (2001) 143--175.","DOI":"10.1023\/A:1007612920971"},{"volume-title":"International Conference on Machine Learning. 579--587","year":"2015","author":"Ding Yufei","key":"e_1_3_2_1_16_1"},{"volume-title":"2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 1--14","author":"ElTantawy A.","key":"e_1_3_2_1_17_1"},{"volume-title":"Warp Scheduling for Fine-Grained Synchronization. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 375--388","author":"ElTantawy A.","key":"e_1_3_2_1_18_1"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2540708.2540743"},{"key":"e_1_3_2_1_20_1","unstructured":"Greg Hamerly and Charles Elkan. 2004. Learning the k in k-means. In Advances in neural information processing systems. 281--288.   Greg Hamerly and Charles Elkan. 2004. Learning the k in k-means. In Advances in neural information processing systems. 281--288."},{"key":"e_1_3_2_1_21_1","unstructured":"Mark Harris. 2013. Using Shared Memory in CUDA C\/C++. https:\/\/devblogs.nvidia.com\/using-shared-memory-cuda-cc\/  Mark Harris. 2013. Using Shared Memory in CUDA C\/C++. https:\/\/devblogs.nvidia.com\/using-shared-memory-cuda-cc\/"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.27"},{"volume-title":"2018 International Workshop on Advanced Image Technology (IWAIT). 1--4.","author":"Lai S. C.","key":"e_1_3_2_1_23_1"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751232"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CIT.2010.60"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/1690219.1690290"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2008.31"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1982.1056489"},{"key":"e_1_3_2_1_29_1","unstructured":"Vadim Markovtsev. 2016. Towards Yinyang K-means on GPU. https:\/\/blog.sourced.tech\/post\/towards_kmeans_on_gpu\/  Vadim Markovtsev. 2016. Towards Yinyang K-means on GPU. https:\/\/blog.sourced.tech\/post\/towards_kmeans_on_gpu\/"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2008.4636089"},{"key":"e_1_3_2_1_31_1","unstructured":"NVIDIA. 2018. Compute Capability 7.x. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html#compute-capability-7-x  NVIDIA. 2018. Compute Capability 7.x. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html#compute-capability-7-x"},{"key":"e_1_3_2_1_32_1","unstructured":"Michael Ortega-Binderberger Kriengkrai Porkaew and Sharad Mehrotra. 1999. Corel Image Features Data Set. data retrieved from University of California Irvine Machine Learning Repository http:\/\/archive.ics.uci.edu\/ml\/datasets\/Corel+Image+Features.  Michael Ortega-Binderberger Kriengkrai Porkaew and Sharad Mehrotra. 1999. Corel Image Features Data Set. data retrieved from University of California Irvine Machine Learning Repository http:\/\/archive.ics.uci.edu\/ml\/datasets\/Corel+Image+Features."},{"key":"e_1_3_2_1_33_1","unstructured":"O. J. Oyelade O. O. Oladipupo and I. C. Obagbuwa. 2010. Application of k Means Clustering algorithm for prediction of Students Academic Performance. CoRR abs\/1002.2425 (2010). arXiv:1002.2425  O. J. Oyelade O. O. Oladipupo and I. C. Obagbuwa. 2010. Application of k Means Clustering algorithm for prediction of Students Academic Performance. CoRR abs\/1002.2425 (2010). arXiv:1002.2425"},{"key":"e_1_3_2_1_34_1","first-page":"3","article-title":"OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems","volume":"12","author":"Stone John E.","year":"2010","journal-title":"IEEE Des. Test"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2903150.2903155"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/L-CA.2013.4"}],"event":{"name":"PPoPP '19: 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","sponsor":["SIGPLAN ACM Special Interest Group on Programming Languages","SIGHPC ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing"],"location":"Washington DC USA","acronym":"PPoPP '19"},"container-title":["Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3303084.3309488","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3303084.3309488","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3303084.3309488","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:53:39Z","timestamp":1750204419000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3303084.3309488"}},"subtitle":["A Case Study of K-Means on GPU"],"short-title":[],"issued":{"date-parts":[[2019,2,17]]},"references-count":36,"alternative-id":["10.1145\/3303084.3309488","10.1145\/3303084"],"URL":"https:\/\/doi.org\/10.1145\/3303084.3309488","relation":{},"subject":[],"published":{"date-parts":[[2019,2,17]]},"assertion":[{"value":"2019-02-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}