{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:31:46Z","timestamp":1750221106611,"version":"3.41.0"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2018,12,31]],"date-time":"2018-12-31T00:00:00Z","timestamp":1546214400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"crossref","award":["EP\/K008730\/1"],"award-info":[{"award-number":["EP\/K008730\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2018,12,31]]},"abstract":"<jats:p>Focal-plane Sensor-Processor Arrays (FPSPs) are new imaging devices with parallel Single Instruction Multiple Data (SIMD) computational capabilities built into every pixel. Compared to traditional imaging devices, FPSPs allow for massive pixel-parallel execution of image processing algorithms. This enables the application of certain algorithms at extreme frame rates (&gt;10,000 frames per second). By performing some early-stage processing in-situ, systems incorporating FPSPs can consume less power compared to conventional approaches using standard digital cameras. In this article, we explore code generation for an FPSP whose 256 \u00d7 256 processors operate on analogue signal data, leading to further opportunities for power reduction\u2014and additional code synthesis challenges.<\/jats:p>\n          <jats:p>While rudimentary image processing algorithms have been demonstrated on FPSPs before, progress with higher-level computer vision algorithms has been sparse due to the unique architecture and limits of the devices. This article presents a code generator for convolution filters for the SCAMP-5 FPSP, with applications in many high-level tasks such as convolutional neural networks, pose estimation, and so on. The SCAMP-5 FPSP has no effective multiply operator. Convolutions have to be implemented through sequences of more primitive operations such as additions, subtractions, and multiplications\/divisions by two. We present a code generation algorithm to optimise convolutions by identifying common factors in the different weights and by determining an optimised pattern of pixel-to-pixel data movements to exploit them. We present evaluation in terms of both speed and energy consumption for a suite of well-known convolution filters. Furthermore, an application of the method is shown by the implementation of a Viola-Jones face detection algorithm.<\/jats:p>","DOI":"10.1145\/3291055","type":"journal-article","created":{"date-parts":[[2019,1,8]],"date-time":"2019-01-08T15:53:12Z","timestamp":1546962792000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["AUKE"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0810-2557","authenticated-orcid":false,"given":"Thomas","family":"Debrunner","sequence":"first","affiliation":[{"name":"Imperial College London, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6385-6127","authenticated-orcid":false,"given":"Sajad","family":"Saeedi","sequence":"additional","affiliation":[{"name":"Imperial College London, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5905-1804","authenticated-orcid":false,"given":"Paul H. J.","family":"Kelly","sequence":"additional","affiliation":[{"name":"Imperial College London, UK"}]}],"member":"320","published-online":{"date-parts":[[2019,1,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jestch.2015.06.006"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-018-0180-5"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TEC.1961.5219227"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277548.1277552"},{"volume-title":"Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV\u201917)","author":"Bose L.","key":"e_1_2_1_5_1","unstructured":"L. Bose , J. Chen , S. J. Carey , P. Dudek , and W. Mayol-Cuevas . 2017. Visual odometry for pixel processor arrays . In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV\u201917) . 4614--4622. L. Bose, J. Chen, S. J. Carey, P. Dudek, and W. Mayol-Cuevas. 2017. Visual odometry for pixel processor arrays. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV\u201917). 4614--4622."},{"key":"e_1_2_1_6_1","first-page":"401","article-title":"Primitive operator digital filters","volume":"138","author":"Bull D. R.","year":"1991","unstructured":"D. R. Bull and D. H Horrocks . 1991 . Primitive operator digital filters . IEE Proc. G 138 , 3 (1991), 401 -- 412 . D. R. Bull and D. H Horrocks. 1991. Primitive operator digital filters. IEE Proc. G 138, 3 (1991), 401--412.","journal-title":"IEE Proc. G"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CNNA.2012.6331468"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 2013 Symposium on VLSI Circuits (VLSIC\u201913)","author":"Carey Stephen J","year":"2013","unstructured":"Stephen J Carey , Alexey Lopich , David RW Barr , Bin Wang , and Piotr Dudek . 2013 . A 100,000 fps vision sensor with embedded 535GOPS\/W 256 \u00d7 256 SIMD processor array . In Proceedings of the 2013 Symposium on VLSI Circuits (VLSIC\u201913) . IEEE, C182--C183. Stephen J Carey, Alexey Lopich, David RW Barr, Bin Wang, and Piotr Dudek. 2013. A 100,000 fps vision sensor with embedded 535GOPS\/W 256 \u00d7 256 SIMD processor array. In Proceedings of the 2013 Symposium on VLSI Circuits (VLSIC\u201913). IEEE, C182--C183."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.2011.5937875"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916)","author":"Chen H. G.","key":"e_1_2_1_10_1","unstructured":"H. G. Chen , S. Jayasuriya , J. Yang , J. Stephen , S. Sivaramakrishnan , A. Veeraraghavan , and A. Molnar . 2016. ASP vision: Optically computing the first layer of convolutional neural networks using angle sensitive pixels . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916) . 903--912. H. G. Chen, S. Jayasuriya, J. Yang, J. Stephen, S. Sivaramakrishnan, A. Veeraraghavan, and A. Molnar. 2016. ASP vision: Optically computing the first layer of convolutional neural networks using angle sensitive pixels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916). 903--912."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062297"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.58"},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","DOI":"10.1109\/82.466647","article-title":"Use of minimum-adder multiplier blocks in FIR digital filters","author":"Dempster Andrew G.","year":"1995","unstructured":"Andrew G. Dempster and Malcolm D. Macleod . 1995 . Use of minimum-adder multiplier blocks in FIR digital filters . IEEE Trans. Circ. Syst. II 42, 9 (1995), 569--577. Andrew G. Dempster and Malcolm D. Macleod. 1995. Use of minimum-adder multiplier blocks in FIR digital filters. IEEE Trans. Circ. Syst. II 42, 9 (1995), 569--577.","journal-title":"IEEE Trans. Circ. Syst."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/4.597292"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.2003.1205136"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.2005.1465958"},{"key":"e_1_2_1_17_1","article-title":"A CMOS general-purpose sampled-data analog processing element","author":"Dudek Piotr","year":"2000","unstructured":"Piotr Dudek and Peter J. Hicks . 2000 . A CMOS general-purpose sampled-data analog processing element . IEEE Trans. Circ. Syst. II 47, 5 (2000), 467--473. Piotr Dudek and Peter J. Hicks. 2000. A CMOS general-purpose sampled-data analog processing element. IEEE Trans. Circ. Syst. II 47, 5 (2000), 467--473.","journal-title":"IEEE Trans. Circ. Syst."},{"key":"e_1_2_1_18_1","article-title":"A general-purpose processor-per-pixel analog SIMD vision chip","author":"Dudek Piotr","year":"2005","unstructured":"Piotr Dudek and Peter J. Hicks . 2005 . A general-purpose processor-per-pixel analog SIMD vision chip . IEEE Trans. Circuits and Syst. I 52, 1 (2005), 13--20. Piotr Dudek and Peter J. Hicks. 2005. A general-purpose processor-per-pixel analog SIMD vision chip. IEEE Trans. Circuits and Syst. I 52, 1 (2005), 13--20.","journal-title":"IEEE Trans. Circuits and Syst."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/1142725.1711182"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/35.3-4.414"},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","DOI":"10.1109\/82.539000","article-title":"Subexpression sharing in filters using canonic signed digit multipliers","author":"Hartley Richard I.","year":"1996","unstructured":"Richard I. Hartley . 1996 . Subexpression sharing in filters using canonic signed digit multipliers . IEEE Trans. Circ. Syst. II 43, 10 (1996), 677--688. Richard I. Hartley. 1996. Subexpression sharing in filters using canonic signed digit multipliers. IEEE Trans. Circ. Syst. II 43, 10 (1996), 677--688.","journal-title":"IEEE Trans. Circ. Syst."},{"key":"e_1_2_1_22_1","volume-title":"Hinton","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E . Hinton . 2012 . Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems . 1097--1105. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.435"},{"volume-title":"Proceedings of the 3rd Caltech Conference on VLSI. 87","author":"Leiserson Charles E.","key":"e_1_2_1_24_1","unstructured":"Charles E. Leiserson , Flavio M. Rose , and James B. Saxe . 1983. Optimizing synchronous circuitry by retiming . In Proceedings of the 3rd Caltech Conference on VLSI. 87 . Charles E. Leiserson, Flavio M. Rose, and James B. Saxe. 1983. Optimizing synchronous circuitry by retiming. In Proceedings of the 3rd Caltech Conference on VLSI. 87."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-14313-2_47"},{"key":"e_1_2_1_26_1","unstructured":"Fei-Fei Li Marco Andreetto and Marc \u2019Aurelio Ranzato. 2003. Caltech101 image dataset. Retrieved from http:\/\/www.vision.caltech.edu\/Image_Datasets\/Caltech101\/.  Fei-Fei Li Marco Andreetto and Marc \u2019Aurelio Ranzato. 2003. Caltech101 image dataset. Retrieved from http:\/\/www.vision.caltech.edu\/Image_Datasets\/Caltech101\/."},{"key":"e_1_2_1_27_1","unstructured":"Rainer Lienhart. 2013. Haarcascade Frontalface Default. Retrieved from https:\/\/github.com\/opencv\/opencv\/blob\/master\/data\/haarcascades\/haarcascade_frontalface_default.xml.  Rainer Lienhart. 2013. Haarcascade Frontalface Default. Retrieved from https:\/\/github.com\/opencv\/opencv\/blob\/master\/data\/haarcascades\/haarcascade_frontalface_default.xml."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.31"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.aat8084"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1021272100265"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3054944"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the 2015 IEEE International Symposium on Circuits and Systems (ISCAS\u201915). 2061","author":"Martel J. N. P.","year":"2064","unstructured":"J. N. P. Martel , M. Chau , P. Dudek , and M. Cook . 2015. Toward joint approximate inference of visual quantities on cellular processor arrays . In Proceedings of the 2015 IEEE International Symposium on Circuits and Systems (ISCAS\u201915). 2061 -- 2064 . J. N. P. Martel, M. Chau, P. Dudek, and M. Cook. 2015. Toward joint approximate inference of visual quantities on cellular processor arrays. In Proceedings of the 2015 IEEE International Symposium on Circuits and Systems (ISCAS\u201915). 2061--2064."},{"volume-title":"Proceedings of the 15th International Workshop on Cellular Nanoscale Networks and Their Applications (CNNA\u201916)","author":"Martel J. N. P.","key":"e_1_2_1_33_1","unstructured":"J. N. P. Martel , L. K. Mueller , S. J. Carey , and P. Dudek . 2016. A real-time high dynamic range vision system with tone mapping for automotive applications . In Proceedings of the 15th International Workshop on Cellular Nanoscale Networks and Their Applications (CNNA\u201916) . 1--2. J. N. P. Martel, L. K. Mueller, S. J. Carey, and P. Dudek. 2016. A real-time high dynamic range vision system with tone mapping for automotive applications. In Proceedings of the 15th International Workshop on Cellular Nanoscale Networks and Their Applications (CNNA\u201916). 1--2."},{"key":"e_1_2_1_34_1","unstructured":"MIT. 2013. CBCL Face Database #1. Retrieved from http:\/\/www.ai.mit.edu\/projects\/cbcl.  MIT. 2013. CBCL Face Database #1. Retrieved from http:\/\/www.ai.mit.edu\/projects\/cbcl."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2786763.2694364"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/43.739059"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.2009.5118161"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2491956.2462176"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178487.3178500"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2778161"},{"volume-title":"Advances in Computers.","author":"Reitwiesner George W.","key":"e_1_2_1_41_1","unstructured":"George W. Reitwiesner . 1960. Binary arithmetic . In Advances in Computers. Vol. 1 . Elsevier , 231--308. George W. Reitwiesner. 1960. Binary arithmetic. In Advances in Computers. Vol. 1. Elsevier, 231--308."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.12"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2594291.2594342"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2685394"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2001.990517"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1240233.1240234"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.5201\/ipol.2014.104"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2003.819861"},{"volume-title":"Arithmetic Complexity of Computations","author":"Winograd Shmuel","key":"e_1_2_1_49_1","unstructured":"Shmuel Winograd . 1980. Arithmetic Complexity of Computations . Society for Industrial and Applied Mathematics. Philadelphia, PA. Shmuel Winograd. 1980. Arithmetic Complexity of Computations. Society for Industrial and Applied Mathematics. Philadelphia, PA."},{"volume-title":"Focal-Plane Sensor-Processor Chips","author":"Zar\u00e1ndy \u00c1kos","key":"e_1_2_1_50_1","unstructured":"\u00c1kos Zar\u00e1ndy . 2014. Focal-Plane Sensor-Processor Chips . Springer . \u00c1kos Zar\u00e1ndy. 2014. Focal-Plane Sensor-Processor Chips. Springer."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3291055","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3291055","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:01:52Z","timestamp":1750208512000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3291055"}},"subtitle":["Automatic Kernel Code Generation for an Analogue SIMD Focal-Plane Sensor-Processor Array"],"short-title":[],"issued":{"date-parts":[[2018,12,31]]},"references-count":50,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2018,12,31]]}},"alternative-id":["10.1145\/3291055"],"URL":"https:\/\/doi.org\/10.1145\/3291055","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2018,12,31]]},"assertion":[{"value":"2018-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-01-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}