{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T03:04:41Z","timestamp":1773803081040,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"26","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>As Convolutional Neural Networks (CNNs) continue to gain traction in deep learning, Winograd convolution has emerged as a key algorithm to enhance computational efficiency. Although ARM-based CPUs are increasingly prevalent in mobile devices, embedded systems and HPC servers, existing 2D Winograd convolution implementations for ARM often leave room for improvement in transformation efficiency, computational throughput, and overall versatility. Furthermore, the lack of tailored 3D Winograd convolution implementations for ARM architectures stems from the additional complexity of supporting higher-dimensional kernels. AirWino introduces a set of novel optimizations covering transformations,  data layouts, micro-kernel computations, and parallelization strategies for both 2D and 3D Winograd convolution. It supports FP32 and FP16 precisions with filter sizes of 3 and 5, targeting a broad range of applications. Evaluations on four distinct ARM platforms show that AirWino consistently outperforms state-of-the-art libraries across various experimental scenarios and hardware configurations, highlighting its efficiency and portability.<\/jats:p>","DOI":"10.1609\/aaai.v40i26.39288","type":"journal-article","created":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T01:29:58Z","timestamp":1773797398000},"page":"21414-21422","source":"Crossref","is-referenced-by-count":0,"title":["AirWino: Optimized Winograd Convolution for Accelerating CNN Inference on ARMv8 Processors"],"prefix":"10.1609","volume":"40","author":[{"given":"Haoyuan","family":"Gui","sequence":"first","affiliation":[]},{"given":"Xiaoyu","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Yifan","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Ximeng","family":"Fu","sequence":"additional","affiliation":[]},{"given":"Shiqi","family":"Sun","sequence":"additional","affiliation":[]},{"given":"Leisheng","family":"Li","sequence":"additional","affiliation":[]},{"given":"Huiyuan","family":"Li","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/39288\/43249","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/39288\/43249","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T01:29:58Z","timestamp":1773797398000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/39288"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"26","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i26.39288","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}