{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T23:35:32Z","timestamp":1761176132417,"version":"build-2065373602"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"type":"electronic","value":"9781643686318"}],"license":[{"start":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:00:00Z","timestamp":1761004800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,10,21]]},"abstract":"<jats:p>RGB-D video-based gesture recognition is a fundamental task in computer vision, yet it remains challenging due to the small size of hand regions and the presence of redundant background noise. Existing methods often fail to effectively suppress irrelevant background features and inadequately exploit the complementary nature of RGB and depth modalities, leading to suboptimal semantic alignment in feature fusion. To address these issues, we propose a novel end-to-end RGB-D gesture recognition framework that incorporates Spatiotemporal Background Suppression (STBS) and a Bidirectional Modality Fusion Adapter (BMFA). STBS leverages Vision Transformers to construct region-wise tokens and adaptively merges them based on responses scores, suppressing irrelevant background content while preserving fine-grained gesture-related spatiotemporal features. Meanwhile, BMFA enables deep, bidirectional interaction between RGB and depth features across encoder layers, enhancing cross-modal semantic consistency. Extensive experiments on three public RGB-D gesture datasets validate the effectiveness of our method. Experiments conducted on three public RGB-D gesture datasets validate the effectiveness of our approach and demonstrate significant improvements in recognition performance. Our code is available at https:\/\/github.com\/caicai211\/SB-BFM<\/jats:p>","DOI":"10.3233\/faia250859","type":"book-chapter","created":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:44:16Z","timestamp":1761126256000},"source":"Crossref","is-referenced-by-count":0,"title":["Learning to Suppress Backgrounds and Bidirectionally Fuse Modalities for RGB-D Gesture Recognition"],"prefix":"10.3233","author":[{"given":"Xin","family":"Hu","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Xidian University, China"},{"name":"Xi\u2019an Key Laboratory of Big Data and Intelligent Vision, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yunan","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Xidian University, China"},{"name":"Xi\u2019an Key Laboratory of Big Data and Intelligent Vision, China"},{"name":"Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yulang","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Xidian University, China"},{"name":"Xi\u2019an Key Laboratory of Big Data and Intelligent Vision, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yilin","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Xidian University, China"},{"name":"Xi\u2019an Key Laboratory of Big Data and Intelligent Vision, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zixiang","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Xidian University, China"},{"name":"Xi\u2019an Key Laboratory of Big Data and Intelligent Vision, China"},{"name":"Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qiguang","family":"Miao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Xidian University, China"},{"name":"Xi\u2019an Key Laboratory of Big Data and Intelligent Vision, China"},{"name":"Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","ECAI 2025"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA250859","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:44:16Z","timestamp":1761126256000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA250859"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,21]]},"ISBN":["9781643686318"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia250859","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"type":"print","value":"0922-6389"},{"type":"electronic","value":"1879-8314"}],"subject":[],"published":{"date-parts":[[2025,10,21]]}}}