{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T02:00:17Z","timestamp":1776132017713,"version":"3.50.1"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2011,12,1]],"date-time":"2011-12-01T00:00:00Z","timestamp":1322697600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2011,12]]},"abstract":"<jats:p>Image processing operations like blurring, inverse convolution, and summed-area tables are often computed efficiently as a sequence of 1D recursive filters. While much research has explored parallel recursive filtering, prior techniques do not optimize across the entire filter sequence. Typically, a separate filter (or often a causal-anticausal filter pair) is required in each dimension. Computing these filter passes independently results in significant traffic to global memory, creating a bottleneck in GPU systems. We present a new algorithmic framework for parallel evaluation. It partitions the image into 2D blocks, with a small band of additional data buffered along each block perimeter. We show that these perimeter bands are sufficient to accumulate the effects of the successive filters. A remarkable result is that the image data is read only twice and written just once, independent of image size, and thus total memory bandwidth is reduced even compared to the traditional serial algorithm. We demonstrate significant speedups in GPU computation.<\/jats:p>","DOI":"10.1145\/2070781.2024210","type":"journal-article","created":{"date-parts":[[2011,11,30]],"date-time":"2011-11-30T13:58:46Z","timestamp":1322661526000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":47,"title":["GPU-efficient recursive filtering and summed-area tables"],"prefix":"10.1145","volume":"30","author":[{"given":"Diego","family":"Nehab","sequence":"first","affiliation":[{"name":"IMPA"}]},{"given":"Andr\u00e9","family":"Maximo","sequence":"additional","affiliation":[{"name":"IMPA"}]},{"given":"Rodolfo S.","family":"Lima","sequence":"additional","affiliation":[{"name":"Digitok"}]},{"given":"Hugues","family":"Hoppe","sequence":"additional","affiliation":[{"name":"Microsoft Research"}]}],"member":"320","published-online":{"date-parts":[[2011,12,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.42122"},{"key":"e_1_2_1_3_1","volume-title":"IEEE International Conference on Image Processing","volume":"3","author":"Blu T."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/83.931101"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/800031.808600"},{"key":"e_1_2_1_6_1","unstructured":"CUDPP library. 2011. URL http:\/\/code.google.com\/p\/cudpp\/.  CUDPP library. 2011. URL http:\/\/code.google.com\/p\/cudpp\/."},{"key":"e_1_2_1_7_1","unstructured":"CUFFT library. 2007. URL http:\/\/developer.nvidia.com\/cuda-toolkit. NVIDIA Corporation.  CUFFT library. 2007. URL http:\/\/developer.nvidia.com\/cuda-toolkit. NVIDIA Corporation."},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 2  nd  Conference on Image Processing, 263--267","author":"Deriche R.","year":"1992"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1375527.1375559"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2010.61"},{"key":"e_1_2_1_11_1","unstructured":"Harris M. Sengupta S. and Owens J. D. 2008. Parallel prefix sum (scan) with CUDA. In GPU Gems 3 chapter 39.  Harris M. Sengupta S. and Owens J. D. 2008. Parallel prefix sum (scan) with CUDA. In GPU Gems 3 chapter 39."},{"key":"e_1_2_1_12_1","unstructured":"Hensley J. 2010. Advanced rendering technique with DirectX 11: High-quality depth of field. Gamefest 2010 talk.  Hensley J. 2010. Advanced rendering technique with DirectX 11: High-quality depth of field. Gamefest 2010 talk."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-8659.2005.00880.x"},{"key":"e_1_2_1_14_1","volume-title":"Parallel Computers: Architecture, Programming and Algorithms. Adam Hilger.","author":"Hockney R. W.","year":"1981"},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Iverson K. E. 1962. A Programming Language. Wiley.   Iverson K. E. 1962. A Programming Language . Wiley.","DOI":"10.1145\/1460833.1460872"},{"key":"e_1_2_1_16_1","unstructured":"Kass M. Lefohn A. and Owens J. D. 2006. Interactive depth of field using simulated diffusion on a GPU. Technical Report #06-01 Pixar Animation Studios.  Kass M. Lefohn A. and Owens J. D. 2006. Interactive depth of field using simulated diffusion on a GPU. Technical Report #06-01 Pixar Animation Studios."},{"key":"e_1_2_1_17_1","unstructured":"Kirk D. B. and Hwu W. W. 2010. Programming Massively Parallel Processors. Morgan Kaufmann.   Kirk D. B. and Hwu W. W. 2010. Programming Massively Parallel Processors . Morgan Kaufmann."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1973.5009159"},{"key":"e_1_2_1_19_1","unstructured":"Lamas-Rodr\u00edgues J. Heras D. B. B\u00f3o M. and Arg\u00fcello F. 2011. Tridiagonal solvers internal report. Technical report University of Santiago de Compostela.  Lamas-Rodr\u00edgues J. Heras D. B. B\u00f3o M. and Arg\u00fcello F. 2011. Tridiagonal solvers internal report. Technical report University of Santiago de Compostela."},{"key":"e_1_2_1_20_1","volume-title":"Technical Report CS2009-14, University of Virginia.","author":"Merrill D.","year":"2009"},{"key":"e_1_2_1_21_1","unstructured":"Oppenheim A. V. and Schafer R. W. 2010. Discrete-Time Signal Processing. Prentice Hall 3rd edition.   Oppenheim A. V. and Schafer R. W. 2010. Discrete-Time Signal Processing . Prentice Hall 3 rd edition."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/29.32286"},{"key":"e_1_2_1_23_1","unstructured":"Podlozhnyuk V. 2007. Image convolution with CUDA. NVIDIA whitepaper.  Podlozhnyuk V. 2007. Image convolution with CUDA. NVIDIA whitepaper."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/bxq086"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of Graphics Hardware, 97--106","author":"Sengupta S."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/T-C.1971.223205"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/321738.321741"},{"key":"e_1_2_1_28_1","volume-title":"IEEE International Conference on Acoustics, Speech and Signal Processing, 257--260","author":"Sung W."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.113086"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 14  th  International Conference on Pattern Recognition, 509--514 (v. 1).","author":"van Vliet L. J."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1693453.1693472"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2070781.2024210","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2070781.2024210","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:06:03Z","timestamp":1750241163000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2070781.2024210"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,12]]},"references-count":30,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2011,12]]}},"alternative-id":["10.1145\/2070781.2024210"],"URL":"https:\/\/doi.org\/10.1145\/2070781.2024210","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,12]]},"assertion":[{"value":"2011-12-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}