{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,15]],"date-time":"2026-03-15T03:51:53Z","timestamp":1773546713137,"version":"3.50.1"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"3","funder":[{"name":"Key Area R & D Program of Guangdong Province","award":["2022B0701180001"],"award-info":[{"award-number":["2022B0701180001"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2026,3,31]]},"abstract":"<jats:p>As a core technology in real-time video processing and intelligent surveillance, stereo matching provides essential depth perception capabilities for multimedia applications. However, high-precision stereo networks often come with significant computational costs, making real-time inference on power- and memory-constrained edge devices challenging. On the other hand, lightweight real-time networks still struggle with accuracy limitations. To address this challenge, we propose RCAENet, a high-performance stereo network designed for real-time and high-accuracy depth estimation on edge devices. To enhance feature extraction efficiency, we introduce the Residual Convolutional Feature Extraction (RCFE) module, which replaces conventional convolutional layers to capture more expressive features while maintaining computational efficiency. Additionally, we propose the Enhanced Adaptive Upsampling (EAU) module, which integrates channel and spatial attention mechanisms to improve feature fusion and disparity refinement. Furthermore, we design an Enhanced 3D CNN (E3DC) along with the Cost Aggregation and Residual Attention (CA-ResAgg) module for cost volume regularization. This module incorporates residual aggregation and efficient channel attention to further enhance disparity estimation accuracy. Built upon these components, RCAENet features a multi-scale architecture that effectively balances accuracy and efficiency. Extensive experiments demonstrate that these innovations enable RCAENet to achieve real-time inference on edge devices while maintaining state-of-the-art depth accuracy.<\/jats:p>","DOI":"10.1145\/3788678","type":"journal-article","created":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T14:07:04Z","timestamp":1768831624000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["RCAENet: Residual Convolutional and Attention-Enhanced Stereo Matching for Real-Time Depth Estimation on Edge Devices"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2200-9228","authenticated-orcid":false,"given":"Bifa","family":"Liang","sequence":"first","affiliation":[{"name":"School of Integrated Circuits, Sun Yat-sen University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-1346-9644","authenticated-orcid":false,"given":"Yichao","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Integrated Circuits, Sun Yat-sen University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-6831-6429","authenticated-orcid":false,"given":"Ziyang","family":"Hu","sequence":"additional","affiliation":[{"name":"School of Integrated Circuits, Sun Yat-sen University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4947-1589","authenticated-orcid":false,"given":"Zhicong","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4884-323X","authenticated-orcid":false,"given":"Haifeng","family":"Hu","sequence":"additional","affiliation":[{"name":"School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5609-1410","authenticated-orcid":false,"given":"Jianming","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Integrated Circuits, Sun Yat-sen University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5432-8149","authenticated-orcid":false,"given":"Dihu","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Integrated Circuits, Sun Yat-sen University, Shenzhen, China"}]}],"member":"320","published-online":{"date-parts":[[2026,2,27]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/1314303.1314309"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3321513"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/2422956.2422960"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913491297"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2354978"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1014573219977"},{"key":"e_1_3_1_8_2","first-page":"1","article-title":"Playing to vision foundation model\u2019s strengths in stereo matching","author":"Liu Chuang-Wei","year":"2024","unstructured":"Chuang-Wei Liu, Qijun Chen, and Rui Fan. 2024. Playing to vision foundation model\u2019s strengths in stereo matching. IEEE Transactions on Intelligent Vehicles (2024), 1\u201312.","journal-title":"IEEE Transactions on Intelligent Vehicles"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01267-0_35"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8794003"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA40945.2020.9196784"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3088635"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3102109"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSEN.2023.3344947"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.17"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00567"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00027"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00339"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA57147.2024.10611085"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48891.2023.10160441"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01253"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2020.2973813"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00614"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3026899"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3050092"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2023.3335480"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02623"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.438"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCI.2021.3139328"},{"key":"e_1_3_1_33_2","unstructured":"Ziyang Chen Yongjun Zhang Wenting Li Bingshu Wang Yabo Wu Yong Zhao and C. L. Philip Chen. 2025. Hadamard attention recurrent transformer: A strong baseline for stereo matching transformer. arXiv2501.01023. Retrieved from https:\/\/arxiv.org\/abs\/2501.01023"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01287-w"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01290"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00297"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00203"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA40945.2020.9197031"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01231"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2019.8803514"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-023-04646-w"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01647"},{"key":"e_1_3_1_43_2","unstructured":"Ziyang Chen Yongjun Zhang Wenting Li Bingshu Wang Yong Zhao and C. L. Philip Chen. 2024. Motif channel opened in a white-box: Stereo matching via Motif correlation graph. arXiv2411.12426. Retrieved from https:\/\/arxiv.org\/abs\/2411.12426"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00448"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00028"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV51458.2022.00075"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2023.3276328"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1049\/ipr2.12807"},{"key":"e_1_3_1_49_2","first-page":"50828","article-title":"A convolutional attention residual network for stereo matching","year":"2020","unstructured":"Guangyi \u202fHuang, Yongyi \u202fGong, Qingzhen \u202fXu, Kanoksak \u202fWattanachote, Kun \u202fZeng, and Xiaonan \u202fLuo. 2020. A convolutional attention residual network for stereo matching. IEEE Access 8 (2020), 50828\u201350842.","journal-title":"IEEE Access"},{"key":"e_1_3_1_50_2","unstructured":"Xianda Guo Chenming Zhang Youmin Zhang Wenzhao Zheng Dujun Nie Matteo Poggi and Long Chen. 2024. LightStereo: Channel boost is all your need for efficient 2D cost aggregation. arXiv2406.19833. Retrieved from https:\/\/arxiv.org\/abs\/2406.19833"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01863"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/3DV62453.2024.00083"},{"key":"e_1_3_1_53_2","unstructured":"Ce Liu Suryansh Kumar Shuhang Gu Radu Timofte Yao Yao and Luc Van Gool. 2024. Stereo risk: A continuous modeling approach to stereo matching. arXiv2407.03152. Retrieved from https:\/\/arxiv.org\/abs\/2407.03152"},{"key":"e_1_3_1_54_2","doi-asserted-by":"crossref","unstructured":"Hualie Jiang Zhiqiang Lou Laiyan Ding Rui Xu Minglang Tan Wenjie Jiang and Rui Huang. 2025. DEFOM-Stereo: Depth foundation model based stereo matching. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition.","DOI":"10.1109\/CVPR52734.2025.02036"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2025.130069"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2025.3540282"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2025.3569218"},{"key":"e_1_3_1_58_2","doi-asserted-by":"crossref","unstructured":"Junda Cheng Longliang Liu Gangwei Xu Xianqi Wang Zhaoxing Zhang Yong Deng Jinliang Zang Yurui Chen Zhipeng Cai and Xin Yang. 2025. MonSter: Marry monodepth to stereo unleashes power. In Proceedings of the Computer Vision and Pattern Recognition Conference.","DOI":"10.1109\/CVPR52734.2025.00588"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2024.129002"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/3488719"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-11752-2_3"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2019.04.002"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2025.3535386"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.4892"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3788678","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,15]],"date-time":"2026-03-15T03:47:59Z","timestamp":1773546479000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3788678"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,27]]},"references-count":63,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,3,31]]}},"alternative-id":["10.1145\/3788678"],"URL":"https:\/\/doi.org\/10.1145\/3788678","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,27]]},"assertion":[{"value":"2025-03-29","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-01-03","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-02-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}