{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T17:03:33Z","timestamp":1749661413421,"version":"3.37.3"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"9","license":[{"start":{"date-parts":[[2023,5,9]],"date-time":"2023-05-09T00:00:00Z","timestamp":1683590400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,5,9]],"date-time":"2023-05-09T00:00:00Z","timestamp":1683590400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100011688","name":"ECSEL Joint Undertaking","doi-asserted-by":"crossref","award":["H2020-ECSEL-2017-2-783162"],"award-info":[{"award-number":["H2020-ECSEL-2017-2-783162"]}],"id":[{"id":"10.13039\/501100011688","id-type":"DOI","asserted-by":"crossref"}]},{"name":"University of Turku (UTU) including Turku University Central Hospital"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Sign Process Syst"],"published-print":{"date-parts":[[2023,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this paper a method of accelerating image processing using convolution engines with reduced precision calculation is presented. The convolution engines are designed to be used with the Pulpissimo platform with RISC-V System-on-Chip. The aim is to move the calculation to the edge. The proposed linear convolution engines operate on 8-bit data set and the logarithmic convolution engine operates on 4-bit reduced precision data. The data reduction is done by using a logarithmic number space. Diminishing the size of the data to be processed reduces the amount of required memory, requirement for memory bandwidth, required computation, and required hardware area while simultaneously increasing the performance. This performance could benefit modern AI and image processing applications, especially in mobile and other battery-operated devices. The results show that the computation in the linear convolution engine is 91 times faster and computation in the logarithmic convolution engine is 122 times faster than in the RISC-V core with plain RISC-V instructions.\n<\/jats:p>","DOI":"10.1007\/s11265-023-01869-5","type":"journal-article","created":{"date-parts":[[2023,5,9]],"date-time":"2023-05-09T01:43:05Z","timestamp":1683596585000},"page":"1115-1126","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Accelerating Image Processing Using Reduced Precision Calculation Convolution Engines"],"prefix":"10.1007","volume":"95","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4753-1621","authenticated-orcid":false,"given":"Narayan","family":"Pokhrel","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sakari","family":"Sn\u00e4ll","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9412-0393","authenticated-orcid":false,"given":"Olli I.","family":"Heimo","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Uruj","family":"Sarwar","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1010-4386","authenticated-orcid":false,"given":"Antti","family":"Airola","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tero","family":"S\u00e4ntti","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,5,9]]},"reference":[{"key":"1869_CR1","unstructured":"Chitradevi, B., & Srimathi, P. (2014). An overview on image processing techniques. International Journal of Innovative Research in Computer and Communication Engineering, (2(11), pp.6466\u20136472)."},{"key":"1869_CR2","unstructured":"Viswanathan, V., & Hussein, R. (2017). Applications of image processing and real-time embedded systems in autonomous cars: a short review. International Journal of Image Processing (IJIP), (11(2), pp. 35)."},{"key":"1869_CR3","doi-asserted-by":"publisher","unstructured":"Cardoso, J. M., Carvalho, T., Coutinho, J. G., Luk, W., Nobre, R., Diniz, P., & Petrov, Z. (2012). LARA: an aspect-oriented programming language for embedded systems. In Proceedings of the 11th annual international conference on Aspect-oriented Software Development (pp. 179\u2013190). https:\/\/doi.org\/10.1145\/2162049.2162071","DOI":"10.1145\/2162049.2162071"},{"key":"1869_CR4","doi-asserted-by":"publisher","unstructured":"Qadeer, W., Hameed, R., Shacham, O., Venkatesan, P., Kozyrakis, C., & Horowitz, M. A. (2013). Convolution engine: balancing efficiency & flexibility in specialized computing. In Proceedings of the 40th Annual International Symposium on Computer Architecture (pp. 24\u201335). https:\/\/doi.org\/10.1145\/2485922.2485925","DOI":"10.1145\/2485922.2485925"},{"key":"1869_CR5","doi-asserted-by":"crossref","unstructured":"Wu, B., Wan, A., Yue, X., Jin, P., Zhao, S., Golmant, N., Gholaminejad, A., Gonzalez, J., & Keutzer, K. (2018). Shift: A zero flop, zero parameter alternative to spatial convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 9127\u20139135).","DOI":"10.1109\/CVPR.2018.00951"},{"key":"1869_CR6","doi-asserted-by":"publisher","unstructured":"IEEE (2019). \u201cIEEE standard for floating-point arithmetic\u201d, IEEE Std 754\u20132019 (Revision of IEEE 754\u20132008), pp. 1\u201384, 2019. https:\/\/doi.org\/10.1109\/IEEESTD.2019.8766229","DOI":"10.1109\/IEEESTD.2019.8766229"},{"key":"1869_CR7","doi-asserted-by":"publisher","unstructured":"Khwa, W.S., Chen, J.J., Li, J.F., Si, X., Yang, E.Y., Sun, X., Liu, R., Chen, P.Y., Li, Q., Yu, S. & Chang, M.F., 2018, February. A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3 ns and 55.8 TOPS\/W fully parallel product-sum operation for binary DNN edge processors. In 2018 IEEE International Solid-State Circuits Conference-(ISSCC) (pp. 496\u2013498). https:\/\/doi.org\/10.1109\/ISSCC.2018.8310401","DOI":"10.1109\/ISSCC.2018.8310401"},{"key":"1869_CR8","doi-asserted-by":"publisher","unstructured":"Wiercioch-Kuzianik, K., & B\u0105bel, P. (2019). Color hurts. The effect of color on pain perception. Pain Medicine, (20(10), pp. 1955\u20131962). https:\/\/doi.org\/10.1093\/pm\/pny285","DOI":"10.1093\/pm\/pny285"},{"key":"1869_CR9","doi-asserted-by":"crossref","unstructured":"Nixon, M., & Aguado, A. (2019). Feature extraction and image processing for computer vision. Academic press.","DOI":"10.1016\/B978-0-12-814976-8.00003-8"},{"key":"1869_CR10","unstructured":"Ma, Y., Suda, N., Cao, Y., Seo, J. S., & Vrudhula, S. (2016). Scalable and modularized RTL compilation of convolutional neural networks onto FPGA. In 2016 26th International Conference on Field Programmable Logic and Applications (FPL) (pp. 1\u20138). IEEE."},{"key":"1869_CR11","doi-asserted-by":"publisher","unstructured":"Qadeer, W., Hameed, R., Shacham, O., Venkatesan, P., Kozyrakis, C., & Horowitz, M. A. (2015). Convolution engine: balancing efficiency and flexibility in specialized computing. In Communications of the ACM (pp. 85\u201393). https:\/\/doi.org\/10.1145\/2735841","DOI":"10.1145\/2735841"},{"key":"1869_CR12","doi-asserted-by":"publisher","unstructured":"LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, (1(4), pp. 541\u2013551). https:\/\/doi.org\/10.1162\/neco.1989.1.4.541","DOI":"10.1162\/neco.1989.1.4.541"},{"key":"1869_CR13","doi-asserted-by":"publisher","unstructured":"Khan, A., Sohail, A., Zahoora, U., & Qureshi, A. S. (2020). A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review, (53(8), pp. 5455\u20135516). https:\/\/doi.org\/10.1007\/s10462-020-09825-6","DOI":"10.1007\/s10462-020-09825-6"},{"key":"1869_CR14","unstructured":"Thompson, N. C., Greenewald, K., Lee, K., & Manso, G. F. (2020). The computational limits of deep learning. arXiv preprint arXiv:2007.05558."},{"key":"1869_CR15","doi-asserted-by":"publisher","unstructured":"Liu, B., Zou, D., Feng, L., Feng, S., Fu, P., & Li, J. (2019). An fpga-based cnn accelerator integrating depthwise separable convolution. Electronics, (8(3), pp. 281). https:\/\/doi.org\/10.3390\/electronics8030281","DOI":"10.3390\/electronics8030281"},{"key":"1869_CR16","unstructured":"Judd, P., Albericio, J., Hetherington, T., Aamodt, T., Jerger, N. E., Urtasun, R., & Moshovos, A. (2015). Reduced-precision strategies for bounded memory in deep neural nets. arXiv preprint arXiv:1511.05236."},{"key":"1869_CR17","unstructured":"Kim, M. (2019). Energy-Efficient ASIC Accelerators for Machine\/Deep Learning Algorithms (Doctoral dissertation, Arizona State University)."},{"key":"1869_CR18","unstructured":"Mairal, J. (2016). End-to-end kernel learning with supervised convolutional kernel networks. In 30th Conference on Neural Information Processing Systems (NIPS 2016), (pp. 1\u201316)."},{"key":"1869_CR19","doi-asserted-by":"publisher","unstructured":"Blomqvist, K., Kaski, S., & Heinonen, M. (2019). Deep convolutional Gaussian processes. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 582\u2013597). Springer, Cham. https:\/\/doi.org\/10.1007\/978-3-030-46147-8_35","DOI":"10.1007\/978-3-030-46147-8_35"},{"key":"1869_CR20","doi-asserted-by":"publisher","unstructured":"Dosovitskiy, A., Fischer, P., Springenberg, J. T., Riedmiller, M., & Brox, T. (2015). Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE transactions on pattern analysis and machine intelligence, (38(9), pp. 1734\u20131747). https:\/\/doi.org\/10.1109\/TPAMI.2015.2496141","DOI":"10.1109\/TPAMI.2015.2496141"},{"key":"1869_CR21","doi-asserted-by":"publisher","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. nature, (518(7540), pp. 529\u2013533). https:\/\/doi.org\/10.1038\/nature14236","DOI":"10.1038\/nature14236"},{"key":"1869_CR22","unstructured":"Miyashita, D., Lee, E. H., & Murmann, B. (2016). Convolutional neural networks using logarithmic data representation. arXiv preprint arXiv:1603.01025."},{"key":"1869_CR23","doi-asserted-by":"publisher","unstructured":"Lee, E. H., Miyashita, D., Chai, E., Murmann, B., & Wong, S. S. (2017). Lognet: Energy-efficient neural networks using logarithmic computation. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5900\u20135904). IEEE. https:\/\/doi.org\/10.1109\/ICASSP.2017.7953288","DOI":"10.1109\/ICASSP.2017.7953288"},{"key":"1869_CR24","doi-asserted-by":"publisher","unstructured":"Vogel, S., Liang, M., Guntoro, A., Stechele, W., & Ascheid, G. (2018). Efficient hardware acceleration of CNNs using logarithmic data representation with arbitrary log-base. In Proceedings of the International Conference on Computer-Aided Design (pp. 1\u20138). https:\/\/doi.org\/10.1145\/3240765.3240803","DOI":"10.1145\/3240765.3240803"},{"key":"1869_CR25","unstructured":"GitHub, Inc. (2021). PULPissimo. Retrieved January 13, 2021, from https:\/\/github.com\/pulp-platform\/pulpissimo"},{"key":"1869_CR26","doi-asserted-by":"crossref","unstructured":"Waterman, A., Lee, Y., Patterson, D. A., & Asanovi, K. (2014). The risc-v instruction set manual. volume 1: User-level isa, version 2.0. California Univ Berkeley Dept of Electrical Engineering and Computer Sciences.","DOI":"10.21236\/ADA605735"},{"key":"1869_CR27","unstructured":"Stallman, R. M. (1988). Using the GNU Compiler Collection. For GCC version, (4(2)). Retrieved January 13, 2021, from https:\/\/gcc.gnu.org\/onlinedocs\/gcc.pdf"},{"key":"1869_CR28","unstructured":"Fletcher C, W. (2009). Interfaces: Fifo (a.k.a. ready\/valid). Retrieved March 3, 2021, from https:\/\/inst.eecs.berkeley.edu\/~cs150\/Documents\/Interfaces.pdf"},{"key":"1869_CR29","doi-asserted-by":"publisher","unstructured":"Gautschi, M., Schiavone, P. D., Traber, A., Loi, I., Pullini, A., Rossi, D., Flamand, E., G\u00fcrkaynak, F. K., & Benini, L. (2017). Near-threshold RISC-V core with DSP extensions for scalable IoT endpoint devices. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, (25(10), pp. 2700\u20132713). https:\/\/doi.org\/10.1109\/TVLSI.2017.2654506","DOI":"10.1109\/TVLSI.2017.2654506"},{"key":"1869_CR30","doi-asserted-by":"publisher","unstructured":"The FitOptiVis ECSEL project: highly efficient distributed embedded image\/video processing in cyber-physical systems, ACM Int'l Conf. on Computing Frontiers, 2019, pp. 333\u2013338, https:\/\/doi.org\/10.1145\/3310273.3323437","DOI":"10.1145\/3310273.3323437"}],"container-title":["Journal of Signal Processing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11265-023-01869-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11265-023-01869-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11265-023-01869-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,9]],"date-time":"2023-10-09T08:18:37Z","timestamp":1696839517000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11265-023-01869-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,9]]},"references-count":30,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2023,9]]}},"alternative-id":["1869"],"URL":"https:\/\/doi.org\/10.1007\/s11265-023-01869-5","relation":{},"ISSN":["1939-8018","1939-8115"],"issn-type":[{"type":"print","value":"1939-8018"},{"type":"electronic","value":"1939-8115"}],"subject":[],"published":{"date-parts":[[2023,5,9]]},"assertion":[{"value":"1 April 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 March 2023","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 April 2023","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 May 2023","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflicts of Interest\/Competing Interests"}}]}}