{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T02:42:25Z","timestamp":1773801745872,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>Scene recognition (SR) is a fundamental task in computer vision (CV). In recent years, Transformer-based methods have achieved remarkable success in scene recognition tasks. Most existing approaches primarily rely on visual features, while failing to effectively model the structural relationships within scenes, which are crucial for accurate scene recognition. To this end, we propose Topology Attention Network for Scene Recognition (TANSR), an innovative method that leverages topological relationships from graphs to guide scene recognition. Specifically, Graph Attention Mask Generation Network (GAMGN) generates topology-aware masks from graph representations constructed by Graph Generation Module (GGM) and integrates them with patch embeddings by Topology Attention Guidance (TAG), enabling the transformer's attention mechanism to incorporate topological information. Furthermore, we introduce an innovative attention-driven multimodal fusion strategy that integrates graph-derived topological cues with visual patch embeddings, substantially enhancing the transformer\u2019s capability to capture topological information and improving performance in complex scene recognition tasks. We evaluate TANSR on the benchmarks MIT-67, Scene-15 and SUN397, where it achieves consistent state-of-the-art (SOTA) performance, including 98.58% accuracy on MIT-67.<\/jats:p>","DOI":"10.1609\/aaai.v40i12.38001","type":"journal-article","created":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T23:55:15Z","timestamp":1773791715000},"page":"10315-10322","source":"Crossref","is-referenced-by-count":0,"title":["Topology-Aware Vision Transformers for Enhanced Scene Recognition"],"prefix":"10.1609","volume":"40","author":[{"given":"Yunxi","family":"Wang","sequence":"first","affiliation":[]},{"given":"Shuaiyu","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Qiling","family":"Li","sequence":"additional","affiliation":[]},{"given":"Yazhou","family":"Ren","sequence":"additional","affiliation":[]},{"given":"Xiaorong","family":"Pu","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38001\/41963","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38001\/41963","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T23:55:16Z","timestamp":1773791716000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/38001"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i12.38001","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}