{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T15:52:50Z","timestamp":1772121170814,"version":"3.50.1"},"reference-count":20,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T00:00:00Z","timestamp":1675209600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["HA 3088\/26-1"],"award-info":[{"award-number":["HA 3088\/26-1"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>In this article, we introduce a parallel algorithm for connected-component analysis (CCA) on GPUs which drastically reduces the volume of data to transfer from GPU to the host. CCA algorithms targeting GPUs typically store the extracted features in arrays large enough to potentially hold the maximum possible number of objects for the given image size. Transferring these large arrays to the host requires large portions of the overall execution time. Therefore, we propose an algorithm which uses a CUDA kernel to merge trees of connected component feature structs. During the tree merging, various connected-component properties, such as total area, centroid and bounding box, are extracted and accumulated. The tree structure then enables us to only transfer features of valid objects to the host for further processing or storing. Our benchmarks show that this implementation significantly reduces memory transfer volume for processing results on the host whilst maintaining similar performance to state-of-the-art CCA algorithms.<\/jats:p>","DOI":"10.3390\/a16020080","type":"journal-article","created":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T03:47:33Z","timestamp":1675309653000},"page":"80","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Parallel Algorithm for Connected-Component Analysis Using CUDA"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3558-5750","authenticated-orcid":false,"given":"Dominic","family":"Windisch","sequence":"first","affiliation":[{"name":"Institute of Power Engineering, Technische Universit\u00e4t Dresden, 01062 Dresden, Germany"}]},{"given":"Christian","family":"Kaever","sequence":"additional","affiliation":[{"name":"Helmholtz-Zentrum Dresden\u2014Rossendorf, Bautzner Landstr. 400, 01328 Dresden, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9935-4428","authenticated-orcid":false,"given":"Guido","family":"Juckeland","sequence":"additional","affiliation":[{"name":"Helmholtz-Zentrum Dresden\u2014Rossendorf, Bautzner Landstr. 400, 01328 Dresden, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3428-5019","authenticated-orcid":false,"given":"Andr\u00e9","family":"Bieberle","sequence":"additional","affiliation":[{"name":"Helmholtz-Zentrum Dresden\u2014Rossendorf, Bautzner Landstr. 400, 01328 Dresden, Germany"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1016\/j.measurement.2014.04.008","article-title":"Automatic segmentation, counting, size determination and classification of white blood cells","volume":"55","author":"Nazlibilek","year":"2014","journal-title":"Measurement"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"8459","DOI":"10.1007\/s11042-019-7347-4","article-title":"Automatic segmentation of liver & lesion detection using H-minima transform and connecting component labeling","volume":"79","author":"Khan","year":"2020","journal-title":"Multimed. Tools Appl."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"8189403","DOI":"10.1155\/2020\/8189403","article-title":"Fabric Defect Detection Using Computer Vision Techniques: A Comprehensive Review","volume":"2020","author":"Rasheed","year":"2020","journal-title":"Math. Probl. Eng."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1007\/s40684-021-00343-6","article-title":"State of the Art in Defect Detection Based on Machine Vision","volume":"9","author":"Ren","year":"2022","journal-title":"Int. J. Precis. Eng. Manuf.-Green Technol."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1813","DOI":"10.1007\/s11554-017-0689-0","article-title":"Real-time embedded system for traffic sign recognition based on ZedBoard","volume":"16","author":"Farhat","year":"2019","journal-title":"J. Real-Time Image Process."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"You, S., Bi, Q., Ji, Y., Liu, S., Feng, Y., and Wu, F. (2020). Traffic Sign Detection Method Based on Improved SSD. Information, 11.","DOI":"10.3390\/info11100475"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2254","DOI":"10.1016\/j.nucengdes.2009.11.016","article-title":"Ultra fast electron beam X-ray computed tomography for two-phase flow measurement","volume":"240","author":"Fischer","year":"2010","journal-title":"Nucl. Eng. Des."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1016\/j.cpc.2017.05.025","article-title":"Rapid data processing for ultrafast X-ray computed tomography using scalable and modular CUDA based pipelines","volume":"219","author":"Frust","year":"2017","journal-title":"Comput. Phys. Commun."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Windisch, D., Kelling, J., Juckeland, G., Bieberle, A., and Hampel, U. (2022). Real-time Data Processing for Ultrafast X-Ray Computed Tomography using Modular CUDA based Pipelines. Trans. Inst. Meas. Control, submitted.","DOI":"10.1016\/j.cpc.2023.108719"},{"key":"ref_10","first-page":"691","article-title":"Control concepts for image-based structure tracking with ultrafast electron beam X-ray tomography","volume":"42","author":"Windisch","year":"2020","journal-title":"Comput. Phys. Commun."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1217","DOI":"10.1109\/TPDS.2018.2799216","article-title":"A New Algorithm for Parallel Connected-Component Labelling on GPUs","volume":"29","author":"Playne","year":"2018","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_12","unstructured":"Kaever, C. (2021). Real-Time Object Recognition for Ultrafast Electron Beam X-ray Computed Tomography, Technische Universit\u00e4t Dresden. Technical Report."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1145\/321356.321357","article-title":"Sequential Operations in Digital Picture Processing","volume":"13","author":"Rosenfeld","year":"1966","journal-title":"J. ACM"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"538","DOI":"10.1016\/S0146-664X(77)80015-4","article-title":"A sequential approach to the extraction of shape features","volume":"6","author":"Agrawala","year":"1977","journal-title":"Comput. Graph. Image Process."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1007\/s11554-009-0134-0","article-title":"Light speed labeling: Efficient connected component labeling on RISC architectures","volume":"6","author":"Lacassagne","year":"2011","journal-title":"J. Real-Time Image Process."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Hennequin, A., Lacassagne, L., Cabaret, L., and Meunier, Q. (2018, January 10\u201312). A new Direct Connected Component Labeling and Analysis Algorithms for GPUs. Proceedings of the 2018 Conference on Design and Architectures for Signal and Image Processing (DASIP), Porto, Portugal.","DOI":"10.1109\/DASIP.2018.8596835"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1007\/s11554-016-0574-2","article-title":"Parallel Light Speed Labeling: An efficient connected component algorithm for labeling and analysis on multi-core processors","volume":"15","author":"Cabaret","year":"2018","journal-title":"J. Real-Time Image Process."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"\u0158iha, L., and Mareboyana, M. (2011, January 5\u20137). GPU accelerated one-pass algorithm for computing minimal rectangles of connected components. Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV), Kona, HI, USA.","DOI":"10.1109\/WACV.2011.5711542"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Lemaitre, F., Hennequin, A., and Lacassagne, L. (2021, January 6\u201312). Taming Voting Algorithms on Gpus for an Efficient Connected Component Analysis Algorithm. Proceedings of the ICASSP 2021\u20142021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9413653"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1109\/TPDS.2019.2934683","article-title":"Optimized Block-Based Algorithms to Label Connected Components on GPUs","volume":"31","author":"Allegretti","year":"2019","journal-title":"IEEE Trans. Parallel Distrib. Syst."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/2\/80\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:21:27Z","timestamp":1760120487000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/2\/80"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,1]]},"references-count":20,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["a16020080"],"URL":"https:\/\/doi.org\/10.3390\/a16020080","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,1]]}}}