{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,12]],"date-time":"2026-06-12T09:08:11Z","timestamp":1781255291100,"version":"3.54.1"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,11,13]],"date-time":"2024-11-13T00:00:00Z","timestamp":1731456000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2024,11,13]],"date-time":"2024-11-13T00:00:00Z","timestamp":1731456000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2025,1]]},"DOI":"10.1007\/s11227-024-06530-x","type":"journal-article","created":{"date-parts":[[2024,11,13]],"date-time":"2024-11-13T18:27:44Z","timestamp":1731522464000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Optimizing computer vision algorithms with TVM on VLIW architecture based on RVV"],"prefix":"10.1007","volume":"81","author":[{"given":"Meng-Shiun","family":"Yu","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hao-Chun","family":"Chang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chong-Teng","family":"Wang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yu-Wei","family":"Tien","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tai-Liang","family":"Chen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jenq-Kuen","family":"Lee","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,11,13]]},"reference":[{"key":"6530_CR1","unstructured":"Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25"},{"key":"6530_CR2","unstructured":"Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861"},{"key":"6530_CR3","doi-asserted-by":"crossref","unstructured":"Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779\u2013788","DOI":"10.1109\/CVPR.2016.91"},{"key":"6530_CR4","doi-asserted-by":"crossref","unstructured":"Liu W, Anguelov D., Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11\u201314, 2016, Proceedings, Part I 14, Springer, pp 21\u201337","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"6530_CR5","doi-asserted-by":"crossref","unstructured":"Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 4690\u20134699","DOI":"10.1109\/CVPR.2019.00482"},{"key":"6530_CR6","doi-asserted-by":"crossref","unstructured":"Barsoum E, Zhang C, Ferrer CC, Zhang Z (2016) Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp 279\u2013283","DOI":"10.1145\/2993148.2993165"},{"key":"6530_CR7","doi-asserted-by":"crossref","unstructured":"Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2018) Understanding convolution for semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp 1451\u20131460","DOI":"10.1109\/WACV.2018.00163"},{"key":"6530_CR8","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30"},{"key":"6530_CR9","doi-asserted-by":"crossref","unstructured":"Yu M-S, Yuan C-Y, Chen T-L, Lee J-K (2024) Case study: optimization methods with TVM hybrid-op on RISC-V packed SIMD. IEEE Access","DOI":"10.1109\/ACCESS.2024.3397195"},{"key":"6530_CR10","doi-asserted-by":"crossref","unstructured":"Prakash S, Callahan T, Bushagour J, Banbury C, Green AV, Warden P, Ansell T, Reddi VJ (2023) Cfu playground: full-stack open-source framework for tiny machine learning (TinyML) acceleration on FPGAs. In: 2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), IEEE, pp 157\u2013167","DOI":"10.1109\/ISPASS57527.2023.00024"},{"key":"6530_CR11","unstructured":"Giri D, Chiu K-L, Eichler G, Mantovani P, Chandramoorth N, Carloni LP (2020) Ariane+ nvdla: seamless third-party IP integration with esp. In: Workshop on Computer Architecture Research with RISC-V (CARRV)"},{"key":"6530_CR12","doi-asserted-by":"crossref","unstructured":"Lamberti L, Rusci M, Fariselli M, Paci F, Benini L (2021) Low-power license plate detection and recognition on a RISC-V multi-core MCU-based vision system. In: 2021 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, pp 1\u20135","DOI":"10.1109\/ISCAS51556.2021.9401730"},{"key":"6530_CR13","unstructured":"Chen T, Moreau T, Jiang Z, Zheng L, Yan E, Shen H, Cowan M, Wang L, Hu Y, Ceze L, et al. (2018) $$\\{$$TVM$$\\}$$: an automated $$\\{$$End-to-End$$\\}$$ optimizing compiler for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp 578\u2013594"},{"key":"6530_CR14","unstructured":"Foundation R-V. RISC-V Vector Extension Proposal. Online (2021). https:\/\/github.com\/riscv\/riscv-v-spec\/releases\/tag\/v1.0"},{"key":"6530_CR15","unstructured":"Foundation R-V. RISC-V. Online (2021). https:\/\/riscv.org\/"},{"key":"6530_CR16","doi-asserted-by":"publisher","first-page":"89","DOI":"10.1016\/j.sysarc.2016.11.005","volume":"76","author":"P Meloni","year":"2017","unstructured":"Meloni P, Rubattu C, Tuveri G, Pani D, Raffo L, Palumbo F (2017) Real-time neural signal decoding on heterogeneous MPSsocs based on VLIW ASIPs. J Syst Archit 76:89\u2013101","journal-title":"J Syst Archit"},{"issue":"1","key":"6530_CR17","first-page":"1","volume":"11","author":"K-Y Hsieh","year":"2012","unstructured":"Hsieh K-Y, Lai C-H, Lai S-H, Lee JK (2012) Parallelization of belief propagation on cell processors for stereo vision. ACM Trans Embed Comput Syst (TECS) 11(1):1\u201315","journal-title":"ACM Trans Embed Comput Syst (TECS)"},{"issue":"5","key":"6530_CR18","doi-asserted-by":"publisher","first-page":"517","DOI":"10.1002\/cpe.1845","volume":"24","author":"C-B Kuan","year":"2012","unstructured":"Kuan C-B, Lee JK (2012) Compiler supports for VLIW DSP processors with SIMD intrinsics. Concurr Comput: Pract Exp 24(5):517\u2013532","journal-title":"Concurr Comput: Pract Exp"},{"issue":"2","key":"6530_CR19","first-page":"1","volume":"23","author":"S-C Wang","year":"2017","unstructured":"Wang S-C, Kan L-C, Lee C-L, Hwang Y-S, Lee J-K (2017) Architecture and compiler support for GPUs using energy-efficient affine register files. ACM Trans Des Autom Electron Syst (TODAES) 23(2):1\u201325","journal-title":"ACM Trans Des Autom Electron Syst (TODAES)"},{"issue":"3","key":"6530_CR20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3569939","volume":"28","author":"C-C Yang","year":"2023","unstructured":"Yang C-C, Chen Y-R, Liao H-H, Chang Y-M, Lee J-K (2023) Auto-tuning fixed-point precision with TVM on RISC-V packed SIMD extension. ACM Trans Des Autom Electron Syst 28(3):1\u201321","journal-title":"ACM Trans Des Autom Electron Syst"},{"key":"6530_CR21","doi-asserted-by":"publisher","first-page":"373","DOI":"10.1007\/s11265-010-0470-0","volume":"62","author":"DC-W Chang","year":"2011","unstructured":"Chang DC-W, Lin T-J, Wu C-J, Lee J-K, Chu Y-H, Wu A-Y (2011) Parallel architecture core (PAC)-the first multicore application processor SoC in Taiwan part I: hardware architecture & software development tools. J Sign Process Syst 62:373\u2013382","journal-title":"J Sign Process Syst"},{"key":"6530_CR22","unstructured":"Weisstein EW (2004) Affine transformation. https:\/\/mathworld. wolfram. com\/"},{"key":"6530_CR23","doi-asserted-by":"crossref","unstructured":"Rakhmawati L et al (2018) Image privacy protection techniques: a survey. In: TENCON 2018-2018 IEEE Region 10 Conference, IEEE, pp 0076\u20130080","DOI":"10.1109\/TENCON.2018.8650339"},{"key":"6530_CR24","unstructured":"Kong Z, Yang X, He L (2020) A comprehensive comparison of multi-dimensional image denoising methods. arXiv preprint arXiv:2011.03462"},{"key":"6530_CR25","volume-title":"Fundamentals of digital image processing","author":"AK Jain","year":"1989","unstructured":"Jain AK (1989) Fundamentals of digital image processing. Prentice-Hall Inc"},{"key":"6530_CR26","first-page":"35783","volume":"35","author":"J Shao","year":"2022","unstructured":"Shao J, Zhou X, Feng S, Hou B, Lai R, Jin H, Lin W, Masuda M, Yu CH, Chen T (2022) Tensor program optimization with probabilistic programs. Adv Neural Inf Process Syst 35:35783\u201335796","journal-title":"Adv Neural Inf Process Syst"},{"key":"6530_CR27","unstructured":"Chen T, Zheng L, Yan E, Jiang Z, Moreau T, Ceze L, Guestrin C, Krishnamurthy A (2018) Learning to optimize tensor programs. Adv Neural Inf Process Syst 31"},{"key":"6530_CR28","unstructured":"Zheng L, Jia C, Sun M, Wu Z, Yu CH, Haj-Ali A, Wang Y, Yang J, Zhuo D, Sen K et al (2020) Ansor: generating $$\\{$$High-Performance$$\\}$$ tensor programs for deep learning. In: 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pp 863\u2013879"},{"key":"6530_CR29","unstructured":"SiFive: SiFive Performance P650\/P670. Online (2024). https:\/\/www.sifive.com\/cores\/performance-p650-670"},{"key":"6530_CR30","doi-asserted-by":"crossref","unstructured":"Codrescu L et al (2013) Qualcomm hexagon DSP: an architecture optimized for mobile multimedia and communications. In: Hot Chips Symposium, pp 1\u201323","DOI":"10.1109\/HOTCHIPS.2013.7478317"},{"key":"6530_CR31","unstructured":"Alok G (2020) Architecture apocalypse dream architecture for deep learning inference and compute-versal ai core. Embedded World"},{"key":"6530_CR32","unstructured":"SIX C (2022) Developing an llvm backend for the kv3 kalray vliw core. In: EuroLLVM"},{"key":"6530_CR33","unstructured":"Foundation R-V. Spike. Online (2021). https:\/\/github.com\/riscv-software-src\/riscv-isa-sim"},{"key":"6530_CR34","unstructured":"Bellard F (2005) Qemu, a fast and portable dynamic translator. In: USENIX Annual Technical Conference, FREENIX Track, California, vol. 41, pp 10\u20135555"},{"key":"6530_CR35","unstructured":"Lowe-Power J, Ahmad AM, Akram A, Alian M, Amslinger R, Andreozzi M, Armejach A, Asmussen N, Beckmann B, Bharadwaj S, et al (2020) The gem5 simulator: version 20.0+. arXiv preprint arXiv:2007.03152"},{"key":"6530_CR36","unstructured":"Lattner TM (June 2005) An implementation of swing modulo scheduling with extensions for superblocks. Master\u2019s thesis, Computer Science Dept., University of Illinois at Urbana-Champaign, Urbana, IL. See http:\/\/llvm.cs.uiuc.edu"},{"key":"6530_CR37","doi-asserted-by":"publisher","unstructured":"Llosa J, Gonzalez A, Ayguade E, Valero M (1996) Swing module scheduling: a lifetime-sensitive approach. In: Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique, pp 80\u201386. https:\/\/doi.org\/10.1109\/PACT.1996.554030","DOI":"10.1109\/PACT.1996.554030"},{"issue":"3","key":"6530_CR38","doi-asserted-by":"publisher","first-page":"779","DOI":"10.1002\/cpe.3051","volume":"26","author":"C-J Wu","year":"2014","unstructured":"Wu C-J, Lu C-H, Lee JK (2014) Register spilling via transformed interference equations for PAC DSP architecture. Concurr Comput: Pract Exp 26(3):779\u2013799","journal-title":"Concurr Comput: Pract Exp"},{"key":"6530_CR39","doi-asserted-by":"publisher","unstructured":"Li H, Mentens N, Picek S (2022) A scalable simd risc-v based processor with customized vector extensions for crystals-kyber. In: Proceedings of the 59th ACM\/IEEE Design Automation Conference. DAC \u201922, Association for Computing Machinery, New York, pp 733\u2013738. https:\/\/doi.org\/10.1145\/3489517.3530552","DOI":"10.1145\/3489517.3530552"},{"issue":"3","key":"6530_CR40","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1145\/3007787.3001177","volume":"44","author":"Y-H Chen","year":"2016","unstructured":"Chen Y-H, Emer J, Sze V (2016) Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Comput Archit News 44(3):367\u2013379","journal-title":"ACM SIGARCH Comput Archit News"},{"key":"6530_CR41","first-page":"101","volume":"68","author":"A HajiRassouliha","year":"2018","unstructured":"HajiRassouliha A, Taberner AJ, Nash MP, Nielsen PM (2018) Suitability of recent hardware accelerators (DSPs, FPGAs, and GPUs) for computer vision and image processing algorithms. Sign Process: Image Commun 68:101\u2013119","journal-title":"Sign Process: Image Commun"},{"key":"6530_CR42","doi-asserted-by":"crossref","unstructured":"Gupta I, Kaur DP (2022) Fpga based feature extraction in real time computer vision-a comprehensive survey. In: 2022 International Conference on Signal and Information Processing (IConSIP), IEEE, pp 1\u20137","DOI":"10.1109\/ICoNSIP49665.2022.10007508"},{"issue":"2","key":"6530_CR43","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1145\/3140659.3080254","volume":"45","author":"A Parashar","year":"2017","unstructured":"Parashar A, Rhu M, Mukkara A, Puglielli A, Venkatesan R, Khailany B, Emer J, Keckler SW, Dally WJ (2017) Scnn: an accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput Archit News 45(2):27\u201340","journal-title":"ACM SIGARCH Comput Archit News"},{"key":"6530_CR44","unstructured":"Imperas: Imperas RISC-V tests. Github Repo. https:\/\/github.com\/riscv-ovpsim\/imperas-riscv-tests"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-024-06530-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-024-06530-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-024-06530-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,13]],"date-time":"2024-11-13T19:04:33Z","timestamp":1731524673000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-024-06530-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,13]]},"references-count":44,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,1]]}},"alternative-id":["6530"],"URL":"https:\/\/doi.org\/10.1007\/s11227-024-06530-x","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"value":"0920-8542","type":"print"},{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,13]]},"assertion":[{"value":"17 September 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 November 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"172"}}