{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T22:40:05Z","timestamp":1741041605532,"version":"3.38.0"},"reference-count":30,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2010,11,23]],"date-time":"2010-11-23T00:00:00Z","timestamp":1290470400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2011,5]]},"abstract":"<jats:p> Programs developed under the Compute Unified Device Architecture obtain the highest performance rate, when the exploitation of hardware resources on a Graphics Processing Unit (GPU) is maximized. In order to achieve this purpose, load balancing among threads and a high value of processor occupancy, i.e. the ratio of active threads, are indispensable. However, in certain applications, an optimally balanced implementation may limit the occupancy, due to a greater need for registers and shared memory. This is the case of the Fast Generalized Hough Transform (Fast GHT), an image-processing technique for localizing an object within an image. In this work, we present two parallelization alternatives for the Fast GHT, one that optimizes the load balancing and another that maximizes the occupancy. We have compared them using a large amount of real images to test their strong and weak points and we have drawn several conclusions about under which conditions it is better to use one or the other. We have also tackled several parallelization problems related to sparse data distribution, divergent execution paths, and irregular memory access patterns in updating operations by proposing a set of generic techniques, including compacting, sorting, and memory storage replication. Finally, we have compared our Fast GHT with the classic GHT, both on a current GPU, obtaining an important speed-up. <\/jats:p>","DOI":"10.1177\/1094342010383998","type":"journal-article","created":{"date-parts":[[2010,11,24]],"date-time":"2010-11-24T01:49:31Z","timestamp":1290563371000},"page":"205-222","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":5,"title":["Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study"],"prefix":"10.1177","volume":"25","author":[{"given":"Juan","family":"G\u00f3mez-Luna","sequence":"first","affiliation":[{"name":"Computer Architecture and Electronics Department, University of C\u00f3rdoba, C\u00f3rdoba, Spain,"}]},{"given":"Jos\u00e9 Mar\u00eda","family":"Gonz\u00e1lez-Linares","sequence":"additional","affiliation":[{"name":"Computer Architecture Department, University of M\u00e1laga, M\u00e1laga, Spain"}]},{"given":"Jos\u00e9","family":"Ignacio Benavides","sequence":"additional","affiliation":[{"name":"Computer Architecture and Electronics Department, University of C\u00f3rdoba, C\u00f3rdoba"}]},{"given":"Emilio L.","family":"Zapata","sequence":"additional","affiliation":[{"name":"Computer Architecture Department, University of M\u00e1laga, M\u00e1laga, Spain"}]},{"given":"Nicol\u00e1s","family":"Guil","sequence":"additional","affiliation":[{"name":"Computer Architecture Department, University of M\u00e1laga, M\u00e1laga, Spain"}]}],"member":"179","published-online":{"date-parts":[[2010,11,23]]},"reference":[{"key":"atypb1","doi-asserted-by":"publisher","DOI":"10.1016\/0031-3203(81)90009-1"},{"volume-title":"Proceedings of Conference on High Performance Graphics (HPG\u201909)","author":"Billeter, M.","key":"atypb2"},{"key":"atypb3","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.1986.4767851"},{"volume-title":"CUDA Data Parallel Primitives Library home page","year":"2010","author":"Cudpp","key":"atypb4"},{"volume-title":"Proceedings of 22nd International Conference on Supercomputing (ICS\u201908)","author":"Dotsenko, Y.","key":"atypb5"},{"key":"atypb6","doi-asserted-by":"publisher","DOI":"10.1145\/361237.361242"},{"volume-title":"Proceedings of 15th International Euro-Par Conference (Euro-Par\u201909)","author":"G\u00f3mez-Luna, J.","key":"atypb7"},{"volume-title":"CUDA particles","year":"2007","author":"Green, S.","key":"atypb8"},{"key":"atypb9","doi-asserted-by":"publisher","DOI":"10.1016\/S0031-3203(98)00127-7"},{"volume-title":"Optimizing parallel reduction in CUDA","year":"2007","author":"Harris, M.","key":"atypb10"},{"volume-title":"Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD\u201908)","author":"He, B.","key":"atypb11"},{"key":"atypb12","volume":"3069654","author":"Hough, P.V.C.","year":"1962","journal-title":"U.S. Patent"},{"volume-title":"Parallelization of the classic GHT on GPU","year":"2010","author":"Lucena, J.","key":"atypb13"},{"volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops","author":"Luo, Y.","key":"atypb14"},{"volume-title":"NVIDIA CUDA home page","year":"2007","author":"Nvidia","key":"atypb15"},{"volume-title":"NVIDIA CUDA SDK","year":"2007","author":"Nvidia","key":"atypb16"},{"volume-title":"NVIDIA CUDA C programming best practices guide version 3.0","year":"2010","author":"Nvidia","key":"atypb17"},{"volume-title":"NVIDIA CUDA programming guide version 3.0","year":"2010","author":"Nvidia","key":"atypb18"},{"volume-title":"NVIDIA CUDA Visual Profiler","year":"2010","author":"Nvidia","key":"atypb19"},{"volume-title":"OpenCL home page","year":"2009","author":"OpenCL","key":"atypb20"},{"volume-title":"Histogram calculation in CUDA","year":"2007","author":"Podlozhnyuk, V.","key":"atypb21"},{"volume-title":"Image convolution with CUDA","year":"2007","author":"Podlozhnyuk, V.","key":"atypb22"},{"key":"atypb23","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2008.05.011"},{"volume-title":"Proceedings of Image and Video Communications and Processing (SPIE\u201903)","author":"S\u00e1ez, E.","key":"atypb24"},{"volume-title":"Proceedings of Image and Video Communications and Processing (SPIE\u201903)","author":"S\u00e1ez, E.","key":"atypb25"},{"volume-title":"Efficient parallel scan algorithms for GPUs. NVR-2008-003","year":"2008","author":"Sengupta, S.","key":"atypb26"},{"volume-title":"Proceedings of International Conference on Signal Processing and Communications Systems (ICSPCS\u201907)","author":"Shams, R.","key":"atypb27"},{"volume-title":"Proceedings of 12th International Conference on Image Analysis and Processing (ICIAP\u201903)","author":"Strzodka, R.","key":"atypb28"},{"volume-title":"Proceedings of 23rd International Conference on Supercomputing (ICS\u201909)","author":"Venkatasubramanian, S.","key":"atypb29"},{"volume-title":"Proceedings of the 2008 ACM\/ IEEE Conference on Supercomputing (SC\u201908)","author":"Volkov, V.","key":"atypb30"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342010383998","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342010383998","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T21:59:58Z","timestamp":1741039198000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342010383998"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,11,23]]},"references-count":30,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2011,5]]}},"alternative-id":["10.1177\/1094342010383998"],"URL":"https:\/\/doi.org\/10.1177\/1094342010383998","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2010,11,23]]}}}