{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T16:48:54Z","timestamp":1771951734304,"version":"3.50.1"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2018,3,14]],"date-time":"2018-03-14T00:00:00Z","timestamp":1520985600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000266","name":"EPSRC","doi-asserted-by":"crossref","award":["EP\/K009931\/1,EP\/N014758\/1,EP\/N028201\/1"],"award-info":[{"award-number":["EP\/K009931\/1,EP\/N014758\/1,EP\/N028201\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2018,3,31]]},"abstract":"<jats:p>Specialized FPGA implementations can deliver higher performance and greater power efficiency than embedded CPU or GPU implementations for real-time image processing. Programming challenges limit their wider use, because the implementation of FPGA architectures at the register transfer level is time consuming and error prone. Existing software languages supported by high-level synthesis (HLS), although providing a productivity improvement, are too general purpose to generate efficient hardware without the use of hardware-specific code optimizations. Such optimizations leak hardware details into the abstractions that software languages are there to provide, and they require knowledge of FPGAs to generate efficient hardware, such as by using language pragmas to partition data structures across memory blocks.<\/jats:p>\n          <jats:p>This article presents a thorough account of the Rathlin image processing language (RIPL), a high-level image processing domain-specific language for FPGAs. We motivate its design, based on higher-order algorithmic skeletons, with requirements from the image processing domain. RIPL\u2019s skeletons suffice to elegantly describe image processing stencils, as well as recursive algorithms with nonlocal random access patterns. At its core, RIPL employs a dataflow intermediate representation. We give a formal account of the compilation scheme from RIPL skeletons to static and cyclostatic dataflow models to describe their data rates and static scheduling on FPGAs.<\/jats:p>\n          <jats:p>RIPL compares favorably to the Vivado HLS OpenCV library and C++ compiled with Vivado HLS. RIPL achieves between 54 and 191 frames per second (FPS) at 100MHz for four synthetic benchmarks, faster than HLS OpenCV in three cases. Two real-world algorithms are implemented in RIPL: visual saliency and mean shift segmentation. For the visual saliency algorithm, RIPL achieves 71 FPS compared to optimized C++ at 28 FPS. RIPL is also concise, being 5x shorter than C++ and 111x shorter than an equivalent direct dataflow implementation. For mean shift segmentation, RIPL achieves 7 FPS compared to optimized C++ on 64 CPU cores at 1.1, and RIPL is 10x shorter than the direct dataflow FPGA implementation.<\/jats:p>","DOI":"10.1145\/3180481","type":"journal-article","created":{"date-parts":[[2018,3,14]],"date-time":"2018-03-14T12:34:20Z","timestamp":1521030860000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":20,"title":["RIPL"],"prefix":"10.1145","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0365-693X","authenticated-orcid":false,"given":"Robert","family":"Stewart","sequence":"first","affiliation":[{"name":"Heriot-Watt University, Edinburgh, UK"}]},{"given":"Kirsty","family":"Duncan","sequence":"additional","affiliation":[{"name":"Heriot-Watt University, Edinburgh, UK"}]},{"given":"Greg","family":"Michaelson","sequence":"additional","affiliation":[{"name":"Heriot-Watt University, Edinburgh, UK"}]},{"given":"Paulo","family":"Garcia","sequence":"additional","affiliation":[{"name":"Heriot-Watt University, Edinburgh, UK"}]},{"given":"Deepayan","family":"Bhowmik","sequence":"additional","affiliation":[{"name":"Sheffield Hallam University, Sheffield, UK"}]},{"given":"Andrew","family":"Wallace","sequence":"additional","affiliation":[{"name":"Heriot-Watt University, Edinburgh, UK"}]}],"member":"320","published-online":{"date-parts":[[2018,3,14]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2016.18"},{"key":"e_1_2_1_2_1","volume-title":"DSP Builder for Intel FPGAs. Retrieved","year":"2018","unstructured":"Altera. 2017. DSP Builder for Intel FPGAs. Retrieved February 4, 2018 , from https:\/\/www.altera.com\/products\/design-software\/model---simulation\/dsp-builder\/overview.html. Altera. 2017. DSP Builder for Intel FPGAs. Retrieved February 4, 2018, from https:\/\/www.altera.com\/products\/design-software\/model---simulation\/dsp-builder\/overview.html."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2004.36"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS\u201916)","author":"Bezati Endri","unstructured":"Endri Bezati , Simone Casale Brunet , Marco Mattavelli , and J\u00f6rn W. Janneck . 2016. High-level synthesis of dynamic dataflow programs on heterogeneous MPSoC platforms . In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS\u201916) . IEEE, Los Alamitos, CA, 227--234. Endri Bezati, Simone Casale Brunet, Marco Mattavelli, and J\u00f6rn W. Janneck. 2016. High-level synthesis of dynamic dataflow programs on heterogeneous MPSoC platforms. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS\u201916). IEEE, Los Alamitos, CA, 227--234."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/DASIP.2017.8122128"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2016.2627241"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/78.485935"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.89"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1155\/2010\/540159"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1926354.1926358"},{"key":"e_1_2_1_12_1","volume-title":"Algorithmic Skeletons: Structured Management of Parallel Computation","author":"Cole Murray","year":"1991","unstructured":"Murray Cole . 1991 . Algorithmic Skeletons: Structured Management of Parallel Computation . MIT Press , Cambridge, MA . Murray Cole. 1991. Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge, MA."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.5555\/850924.851593"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2000.854761"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/508352.508353"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02476026"},{"key":"e_1_2_1_17_1","volume-title":"Janneck","author":"Eker Johan","year":"2003","unstructured":"Johan Eker and Jorn W . Janneck . 2003 . CAL Language Report Specification of the CAL Actor Language. Technical Report UCB\/ERL M03\/48. EECS Department, University of California , Berkeley. Johan Eker and Jorn W. Janneck. 2003. CAL Language Report Specification of the CAL Actor Language. Technical Report UCB\/ERL M03\/48. EECS Department, University of California, Berkeley."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2145694.2145704"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1975.1055330"},{"key":"e_1_2_1_20_1","volume-title":"Woods","author":"Gonz\u00e1lez Rafael C.","year":"1992","unstructured":"Rafael C. Gonz\u00e1lez and Richard E . Woods . 1992 . Digital Image Processing. Addison-Wesley , Reading, MA. Rafael C. Gonz\u00e1lez and Richard E. Woods. 1992. Digital Image Processing. Addison-Wesley, Reading, MA."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2601097.2601174"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925892"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00165-003-0016-3"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 2012 Symposium on VLSI Technology (VLSIT\u201912)","author":"Jeddeloh J.","unstructured":"J. Jeddeloh and B. Keeth . 2012. Hybrid Memory Cube new DRAM architecture increases density and performance . In Proceedings of the 2012 Symposium on VLSI Technology (VLSIT\u201912) . IEEE, Los Alamitos, CA, 87--88. J. Jeddeloh and B. Keeth. 2012. Hybrid Memory Cube new DRAM architecture increases density and performance. In Proceedings of the 2012 Symposium on VLSI Technology (VLSIT\u201912). IEEE, Los Alamitos, CA, 87--88."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the ACM SIGPLAN Haskell Workshop. ACM","author":"Jones S. Peyton","unstructured":"S. Peyton Jones , A. Tolmach , and T. Hoare . 2001. Playing by the rules: Rewriting as a practical optimisation technique in GHC . In Proceedings of the ACM SIGPLAN Haskell Workshop. ACM , New York, NY, 203--233. S. Peyton Jones, A. Tolmach, and T. Hoare. 2001. Playing by the rules: Rewriting as a practical optimisation technique in GHC. In Proceedings of the ACM SIGPLAN Haskell Workshop. ACM, New York, NY, 203--233."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2003.1251157"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-29822-6_15"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the 32nd IEEE Computer Society International Conference (COMPCON\u201987)","author":"Edward","unstructured":"Edward A. Lee and David G. Messerschmitt. 1987. Synchronous data flow: Describing signal processing algorithm for parallel computation . In Proceedings of the 32nd IEEE Computer Society International Conference (COMPCON\u201987) . IEEE, Los Alamitos, CA, 310--315. Edward A. Lee and David G. Messerschmitt. 1987. Synchronous data flow: Describing signal processing algorithm for parallel computation. In Proceedings of the 32nd IEEE Computer Society International Conference (COMPCON\u201987). IEEE, Los Alamitos, CA, 310--315."},{"key":"e_1_2_1_29_1","volume-title":"Parks","author":"Lee Edward A.","year":"2002","unstructured":"Edward A. Lee and Thomas M . Parks . 2002 . Dataflow process networks. In Readings in Hardware\/Software Co-Design, G. De Micheli, R. Ernst, and W. Wolf (Eds.). Kluwer Academic Publishers , Norwell, MA, 59--85. Edward A. Lee and Thomas M. Parks. 2002. Dataflow process networks. In Readings in Hardware\/Software Co-Design, G. De Micheli, R. Ernst, and W. Wolf (Eds.). Kluwer Academic Publishers, Norwell, MA, 59--85."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/MDAT.2016.2624265"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2001.937655"},{"key":"e_1_2_1_32_1","volume-title":"FPGA Design and SoC Codesign. Retrieved","year":"2018","unstructured":"MathWorks. 2017. FPGA Design and SoC Codesign. Retrieved February 4, 2018 , from https:\/\/uk.mathworks.com\/solutions\/fpga-design.html. MathWorks. 2017. FPGA Design and SoC Codesign. Retrieved February 4, 2018, from https:\/\/uk.mathworks.com\/solutions\/fpga-design.html."},{"key":"e_1_2_1_33_1","volume-title":"SISAL: Streams and Iteration in a Single Assignment Language, Language Reference Manual Version 1.2.","author":"McGraw J.","year":"1985","unstructured":"J. McGraw , S. Skedzielewski , S. Allan , Oldehoeft Oldehoeft , J. Glauert , C. Kirkham , B. Noyce , and R. Thomas . 1985 . SISAL: Streams and Iteration in a Single Assignment Language, Language Reference Manual Version 1.2. Lawrence-Livermore-National- Laboratory, Livermore, CA . J. McGraw, S. Skedzielewski, S. Allan, Oldehoeft Oldehoeft, J. Glauert, C. Kirkham, B. Noyce, and R. Thomas. 1985. SISAL: Streams and Iteration in a Single Assignment Language, Language Reference Manual Version 1.2. Lawrence-Livermore-National-Laboratory, Livermore, CA."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2015.2513673"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3107953"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/LES.2014.2320556"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3180481"},{"key":"e_1_2_1_39_1","series-title":"Lecture Notes in Computer Science","volume-title":"Algorithms and Architectures for Parallel Processing","author":"Stewart Robert","unstructured":"Robert Stewart , Greg J. Michaelson , Deepayan Bhowmik , Paulo Garcia , and Andy Wallace . 2016. A dataflow IR for memory efficient RIPL compilation to FPGAs . In Algorithms and Architectures for Parallel Processing . Lecture Notes in Computer Science , Vol. 1194 . Springer , 174--188. Robert Stewart, Greg J. Michaelson, Deepayan Bhowmik, Paulo Garcia, and Andy Wallace. 2016. A dataflow IR for memory efficient RIPL compilation to FPGAs. In Algorithms and Architectures for Parallel Processing. Lecture Notes in Computer Science, Vol. 1194. Springer, 174--188."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11265-015-1044-y"},{"key":"e_1_2_1_41_1","volume-title":"JPEG2000 Image Compression Fundamentals, Standards and Practice","author":"Taubman David","unstructured":"David Taubman and Michael Marcellin . 2012. JPEG2000 Image Compression Fundamentals, Standards and Practice . Vol. 642 . Springer Science 8 Business Media, Berlin, Germany. David Taubman and Michael Marcellin. 2012. JPEG2000 Image Compression Fundamentals, Standards and Practice. Vol. 642. Springer Science 8 Business Media, Berlin, Germany."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1508128.1508139"},{"key":"e_1_2_1_43_1","volume-title":"Thomas and Philip Moorby","author":"Donald","year":"1996","unstructured":"Donald E. Thomas and Philip Moorby . 1996 . The Verilog Hardware Description Language (3rd ed.). Kluwer , Boston, MA. Donald E. Thomas and Philip Moorby. 1996. The Verilog Hardware Description Language (3rd ed.). Kluwer, Boston, MA."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/216585.216588"},{"key":"e_1_2_1_46_1","volume-title":"System Generator for DSP. Retrieved","year":"2018","unstructured":"Xilinx. 2017a. System Generator for DSP. Retrieved February 4, 2018 , from https:\/\/www.xilinx.com\/products\/design-tools\/vivado\/integration\/sysgen.html. Xilinx. 2017a. System Generator for DSP. Retrieved February 4, 2018, from https:\/\/www.xilinx.com\/products\/design-tools\/vivado\/integration\/sysgen.html."},{"key":"e_1_2_1_47_1","volume-title":"Vivado High-Level Synthesis. Retrieved","year":"2018","unstructured":"Xilinx. 2017b. Vivado High-Level Synthesis. Retrieved February 4, 2018 , from https:\/\/www.xilinx.com\/products\/design-tools\/vivado\/integration\/esl-design.html. Xilinx. 2017b. Vivado High-Level Synthesis. Retrieved February 4, 2018, from https:\/\/www.xilinx.com\/products\/design-tools\/vivado\/integration\/esl-design.html."}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3180481","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3180481","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:26:22Z","timestamp":1750213582000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3180481"}},"subtitle":["A Parallel Image Processing Language for FPGAs"],"short-title":[],"issued":{"date-parts":[[2018,3,14]]},"references-count":44,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2018,3,31]]}},"alternative-id":["10.1145\/3180481"],"URL":"https:\/\/doi.org\/10.1145\/3180481","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,3,14]]},"assertion":[{"value":"2017-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-03-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}