{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:21:27Z","timestamp":1750306887495,"version":"3.41.0"},"reference-count":16,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2012,3,25]],"date-time":"2012-03-25T00:00:00Z","timestamp":1332633600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGARCH Comput. Archit. News"],"published-print":{"date-parts":[[2012,12,25]]},"abstract":"<jats:p>This paper evaluates and discusses how different GPU programming frameworks affect the performance obtained from GPU acceleration of the striped smith-waterman algorithm used for biological sequence alignment. A total of 6 GPU implementations of the algorithm on NVIDIA GT200b and AMD RV870 using the CUDA and the OpenCL frameworks are compared to analyze cons and pros of explicit descriptions for architecture specific hardware mechanisms in the code. The evaluation results show that the primitive descriptions with the CUDA are still efficient especially for small size data, while better instruction scheduling and optimizations are carried out by the OpenCL compiler. On the other hand, the combination of OpenCL and RV870 which provides a relatively simple view of the architecture is efficient for the large data size.<\/jats:p>","DOI":"10.1145\/2460216.2460229","type":"journal-article","created":{"date-parts":[[2013,4,9]],"date-time":"2013-04-09T12:17:58Z","timestamp":1365509878000},"page":"70-75","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Performance comparison of GPU programming frameworks with the striped Smith-Waterman algorithm"],"prefix":"10.1145","volume":"40","author":[{"given":"Takeshi","family":"Kakimoto","sequence":"first","affiliation":[{"name":"Nagasaki University, Japan"}]},{"given":"Keisuke","family":"Dohi","sequence":"additional","affiliation":[{"name":"Nagasaki University, Japan"}]},{"given":"Yuichiro","family":"Shibata","sequence":"additional","affiliation":[{"name":"Nagasaki University, Japan"}]},{"given":"Kiyoshi","family":"Oguri","sequence":"additional","affiliation":[{"name":"Nagasaki University, Japan"}]}],"member":"320","published-online":{"date-parts":[[2012,3,25]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"1","volume-title":"Storage and Analysis, 2008. SC 2008. International Conference for","author":"Datta K.","year":"2009","unstructured":"K. Datta , M. Murphy , V. Volkov , S. Williams , J. Carter , L. Oliker , D. Patterson , J. Shalf , and K. Yelick , \" Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures,\" in High Performance Computing, Networking , Storage and Analysis, 2008. SC 2008. International Conference for , pp. 1 -- 12 , IEEE, 2009 . K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick, \"Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures,\" in High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for, pp. 1--12, IEEE, 2009."},{"key":"e_1_2_1_2_1","first-page":"126","volume-title":"FPL 2009. International Conference on","author":"Asano S.","year":"2009","unstructured":"S. Asano , T. Maruyama , and Y. Yamaguchi , \" Performance comparison of fpga, gpu and cpu in image processing,\" in Field Programmable Logic and Applications, 2009 . FPL 2009. International Conference on , pp. 126 -- 131 , IEEE, 2009 . S. Asano, T. Maruyama, and Y. Yamaguchi, \"Performance comparison of fpga, gpu and cpu in image processing,\" in Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on, pp. 126--131, IEEE, 2009."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/0022-2836(81)90087-5"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/0022-2836(82)90398-9"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.89.22.10915"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-9-S2-S10"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5160931"},{"key":"e_1_2_1_8_1","first-page":"29","volume-title":"2010 21st IEEE International Conference on","author":"Dohi K.","year":"2010","unstructured":"K. Dohi , K. Benkridt , C. Ling , T. Hamada , and Y. Shibata , \" Highly efficient mapping of the smith-waterman algorithm on cuda-compatible gpus,\" in Application-specific Systems Architectures and Processors (ASAP) , 2010 21st IEEE International Conference on , pp. 29 -- 36 , IEEE, 2010 . K. Dohi, K. Benkridt, C. Ling, T. Hamada, and Y. Shibata, \"Highly efficient mapping of the smith-waterman algorithm on cuda-compatible gpus,\" in Application-specific Systems Architectures and Processors (ASAP), 2010 21st IEEE International Conference on, pp. 29--36, IEEE, 2010."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1186\/1756-0500-3-93"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl582"},{"key":"e_1_2_1_11_1","unstructured":"M. Farrar \"Optimizing smith-waterman for the cell broadband engine \"  M. Farrar \"Optimizing smith-waterman for the cell broadband engine \""},{"issue":"2","key":"e_1_2_1_12_1","first-page":"145","article-title":"Using video-oriented instructions to speed up sequence comparison","volume":"13","author":"Wozniak A.","year":"1997","unstructured":"A. Wozniak , \" Using video-oriented instructions to speed up sequence comparison ,\" Comput Appl Biosci , vol. 13 , no. 2 , pp. 145 -- 150 , 1997 . A. Wozniak, \"Using video-oriented instructions to speed up sequence comparison,\" Comput Appl Biosci, vol. 13, no. 2, pp. 145--150, 1997.","journal-title":"Comput Appl Biosci"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/16.8.699"},{"key":"e_1_2_1_14_1","unstructured":"\"CUDA ZONE.\" http:\/\/www.nvidia.com\/object\/cuda home.html.  \"CUDA ZONE.\" http:\/\/www.nvidia.com\/object\/cuda home.html."},{"key":"e_1_2_1_15_1","unstructured":"\"OpenCL.\" http:\/\/www.khronos.org\/opencl\/.  \"OpenCL.\" http:\/\/www.khronos.org\/opencl\/."},{"key":"e_1_2_1_16_1","unstructured":"\"UniProt.\" http:\/\/www.uniprot.org\/.  \"UniProt.\" http:\/\/www.uniprot.org\/."}],"container-title":["ACM SIGARCH Computer Architecture News"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2460216.2460229","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2460216.2460229","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T08:18:46Z","timestamp":1750234726000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2460216.2460229"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,3,25]]},"references-count":16,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2012,12,25]]}},"alternative-id":["10.1145\/2460216.2460229"],"URL":"https:\/\/doi.org\/10.1145\/2460216.2460229","relation":{},"ISSN":["0163-5964"],"issn-type":[{"type":"print","value":"0163-5964"}],"subject":[],"published":{"date-parts":[[2012,3,25]]},"assertion":[{"value":"2012-03-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}