{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T16:17:03Z","timestamp":1771517823003,"version":"3.50.1"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,9]]},"abstract":"<jats:p>The combination of Spiking Neural Networks (SNNs) with Vision Transformer architectures has attracted significant attention due to the great potential for energy-efficient and high-performance computing paradigms. However, a substantial performance gap still exists between SNN-based and ANN-based transformer architectures. While existing methods propose spiking self-attention mechanisms that are successfully combined with SNNs, the overall architectures proposed by these methods suffer from a bottleneck in effectively extracting features from different image scales. In this paper, we address this issue and propose MSVIT, a novel spike-driven Transformer architecture, which firstly uses multi-scale spiking attention (MSSA) to enrich the capability of spiking attention blocks. We validate our approach across various main data sets. The experimental results indicate that our MSVIT outperforms existing SNN-based models, positioning itself as a state-of-the-art solution among NN-transformer architectures. The codes are available at https:\/\/github.com\/Nanhu-AI-Lab\/MSViT.<\/jats:p>","DOI":"10.24963\/ijcai.2025\/601","type":"proceedings-article","created":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T08:10:40Z","timestamp":1758269440000},"page":"5399-5407","source":"Crossref","is-referenced-by-count":1,"title":["MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention Fusion"],"prefix":"10.24963","author":[{"given":"Wei","family":"Hua","sequence":"first","affiliation":[{"name":"China Nanhu Academy of Electronics and Information Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chenlin","family":"Zhou","sequence":"additional","affiliation":[{"name":"University of Chinese Academy of Sciences"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jibin","family":"Wu","sequence":"additional","affiliation":[{"name":"The Hong Kong Polytechnic University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yansong","family":"Chua","sequence":"additional","affiliation":[{"name":"China Nanhu Academy of Electronics and Information Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yangyang","family":"Shu","sequence":"additional","affiliation":[{"name":"The University of New South Wales"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"10584","event":{"name":"Thirty-Fourth International Joint Conference on Artificial Intelligence {IJCAI-25}","theme":"Artificial Intelligence","location":"Montreal, Canada","acronym":"IJCAI-2025","number":"34","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"start":{"date-parts":[[2025,8,16]]},"end":{"date-parts":[[2025,8,22]]}},"container-title":["Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2025,9,23]],"date-time":"2025-09-23T11:34:34Z","timestamp":1758627274000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2025\/601"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2025,9]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2025\/601","relation":{},"subject":[],"published":{"date-parts":[[2025,9]]}}}