{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,26]],"date-time":"2025-12-26T22:32:23Z","timestamp":1766788343535,"version":"3.37.3"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,8,25]],"date-time":"2023-08-25T00:00:00Z","timestamp":1692921600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,8,25]],"date-time":"2023-08-25T00:00:00Z","timestamp":1692921600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100007129","name":"Natural Science Foundation of Shandong Province","doi-asserted-by":"publisher","award":["ZR2020MF076"],"award-info":[{"award-number":["ZR2020MF076"]}],"id":[{"id":"10.13039\/501100007129","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Semantic segmentation plays a vital role in indoor scene analysis. Currently, its accuracy is still limited due to the complex conditions of various indoor scenes. In addition, it is difficult to complete this task solely relying on RGB images. Since depth images can provide additional 3D geometric information to RGB images, researchers chose to incorporate depth images for improving the accuracy of indoor semantic segmentation. However, it is still a challenge to effectively fuse the depth information with the RGB images. To address this issue, a three-stream coordinate attention network is proposed. The presented network reconstructs a multi-modal feature fusion module for RGB-D features, which can realize the aggregation of two modal information along the spatial and channel dimensions. Meanwhile, three convolutional neural network branches are used to construct a parallel three-stream structure, which can, respectively, process the RGB features, depth features and combined features. On one hand, the proposed network can preserve the original RGB and depth feature streams, simultaneously. On the other hand, it can also contribute to utilize and propagate the fusion feature flow better. The embedded ASPP module is used to optimize the semantic information in the proposed network, so as to aggregate the feature information of different scales and obtain more accurate features. Experimental results show that the proposed model can reach a state-of-the-art mIoU accuracy of 50.2% on the NYUDv2 dataset and on the more complex SUN-RGBD dataset.<\/jats:p>","DOI":"10.1007\/s40747-023-01210-4","type":"journal-article","created":{"date-parts":[[2023,8,25]],"date-time":"2023-08-25T02:01:42Z","timestamp":1692928902000},"page":"1219-1230","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation"],"prefix":"10.1007","volume":"10","author":[{"given":"Weikuan","family":"Jia","sequence":"first","affiliation":[]},{"given":"Xingchao","family":"Yan","sequence":"additional","affiliation":[]},{"given":"Qiaolian","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Ting","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Xishang","family":"Dong","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,8,25]]},"reference":[{"issue":"17","key":"1210_CR1","doi-asserted-by":"publisher","first-page":"3493","DOI":"10.3390\/rs13173493","volume":"13","author":"J Pei","year":"2021","unstructured":"Pei J, Wang Z, Sun X et al (2021) FEF-Net: a deep learning approach to multiview SAR image target recognition. Remote Sens 13(17):3493","journal-title":"Remote Sens"},{"issue":"4","key":"1210_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3329784","volume":"52","author":"S Ghosh","year":"2019","unstructured":"Ghosh S, Das N, Das I et al (2019) Understanding deep learning techniques for image segmentation. ACM Comput Surv 52(4):1\u201335","journal-title":"ACM Comput Surv"},{"key":"1210_CR3","doi-asserted-by":"crossref","unstructured":"Silberman N, Fergus R (2011) Indoor scene segmentation using a structured light sensor. In: IEEE international conference on computer vision, pp 601\u2013608","DOI":"10.1109\/ICCVW.2011.6130298"},{"issue":"2","key":"1210_CR4","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1109\/MMUL.2012.24","volume":"19","author":"Z Zhang","year":"2012","unstructured":"Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE Multimed 19(2):4\u201310","journal-title":"IEEE Multimed"},{"key":"1210_CR5","doi-asserted-by":"crossref","unstructured":"Cheng Y, Cai R, Li Z et al (2017) Locality-sensitive deconvolution networks with gated fusion for RGB-D indoor semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3029\u20133037","DOI":"10.1109\/CVPR.2017.161"},{"key":"1210_CR6","doi-asserted-by":"crossref","unstructured":"Zhou H, Qi L, Wan Z et al (2020) RGB-D co-attention network for semantic segmentation. In: Proceedings of the Asian conference on computer vision, pp 519\u2013536","DOI":"10.1007\/978-3-030-69525-5_31"},{"key":"1210_CR7","doi-asserted-by":"crossref","unstructured":"Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431\u20133440","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"1210_CR8","doi-asserted-by":"crossref","unstructured":"Hazirbas C, Ma L, Domokos C et al (2016) Fusenet: incorporating depth into semantic segmentation via fusion-based cnn architecture. In: Asian conference on computer vision, pp 213\u2013228","DOI":"10.1007\/978-3-319-54181-5_14"},{"key":"1210_CR9","doi-asserted-by":"crossref","unstructured":"Seichter D, K\u00f6hler M, Lewandowski B et al (2021) Efficient rgb-d semantic segmentation for indoor scene analysis. In: 2021 IEEE international conference on robotics and automation (ICRA). IEEE, pp 13525\u201313531","DOI":"10.1109\/ICRA48506.2021.9561675"},{"issue":"4","key":"1210_CR10","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1109\/MIS.2020.2999462","volume":"36","author":"W Zhou","year":"2020","unstructured":"Zhou W, Yuan J, Lei J et al (2020) TSNet: three-stream self-attention network for RGB-D indoor semantic segmentation. IEEE Intell Syst 36(4):73\u201378","journal-title":"IEEE Intell Syst"},{"key":"1210_CR11","doi-asserted-by":"crossref","unstructured":"Hu X, Yang K, Fei L et al (2019) ACNET: attention based network to exploit complementary features for RGBD semantic segmentation. In: 2019 IEEE international conference on image processing. IEEE, pp 1440\u20131444","DOI":"10.1109\/ICIP.2019.8803025"},{"key":"1210_CR12","doi-asserted-by":"publisher","first-page":"2533","DOI":"10.1007\/s00521-018-3937-8","volume":"32","author":"S Malakar","year":"2020","unstructured":"Malakar S, Ghosh M, Bhowmik S et al (2020) A GA based hierarchical feature selection approach for handwritten word recognition. Neural Comput Appl 32:2533\u20132552","journal-title":"Neural Comput Appl"},{"issue":"21","key":"1210_CR13","doi-asserted-by":"publisher","first-page":"2705","DOI":"10.3390\/math9212705","volume":"9","author":"N Bacanin","year":"2021","unstructured":"Bacanin N, Stoean R, Zivkovic M et al (2021) Performance of a novel chaotic firefly algorithm with enhanced exploration for tackling global optimization problems: application for dropout regularization. Mathematics 9(21):2705","journal-title":"Mathematics"},{"key":"1210_CR14","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"1210_CR15","doi-asserted-by":"crossref","unstructured":"Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 13713\u201313722","DOI":"10.1109\/CVPR46437.2021.01350"},{"issue":"4","key":"1210_CR16","doi-asserted-by":"publisher","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","volume":"40","author":"LC Chen","year":"2017","unstructured":"Chen LC, Papandreou G, Kokkinos I et al (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans Pattern Anal Mach Intell 40(4):834\u2013848","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1210_CR17","doi-asserted-by":"crossref","unstructured":"Cao J, Leng H, Lischinski D et al (2021) ShapeConv: shape-aware convolutional layer for indoor RGB-D semantic segmentation. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 7088\u20137097","DOI":"10.1109\/ICCV48922.2021.00700"},{"key":"1210_CR18","unstructured":"Park SJ, Hong KS, Lee S (2017) Rdfnet: RGB-D multi-level residual feature fusion for indoor semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 4980\u20134989"},{"key":"1210_CR19","doi-asserted-by":"crossref","unstructured":"Chen X, Lin K Y, Wang J et al (2020) Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation. In: European conference on computer vision, pp 561\u2013577","DOI":"10.1007\/978-3-030-58621-8_33"},{"issue":"10","key":"1210_CR20","doi-asserted-by":"publisher","first-page":"2642","DOI":"10.1109\/TPAMI.2019.2923513","volume":"42","author":"D Lin","year":"2019","unstructured":"Lin D, Huang H (2019) Zig-zag network for semantic segmentation of RGB-D images. IEEE Trans Pattern Anal Mach Intell 42(10):2642\u20132655","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1210_CR21","doi-asserted-by":"publisher","first-page":"1115","DOI":"10.1109\/LSP.2021.3084855","volume":"28","author":"Y Yue","year":"2021","unstructured":"Yue Y, Zhou W, Lei J et al (2021) Two-stage cascaded decoder for semantic segmentation of RGB-D images. IEEE Signal Process Lett 28:1115\u20131119","journal-title":"IEEE Signal Process Lett"},{"key":"1210_CR22","unstructured":"Chen S, Zhu X, Liu W et al (2021) Global-local propagation network for RGB-D semantic segmentation. arXiv:2101.10801"},{"key":"1210_CR23","unstructured":"Deng L, Yang M, Li T et al (2019) RFBNet: deep multimodal networks with residual fusion blocks for RGB-D semantic segmentation. arXiv:1907.00135"},{"key":"1210_CR24","doi-asserted-by":"crossref","unstructured":"Su Y, Yuan Y, Jiang Z (2021) Deep feature selection-and-fusion for RGB-D semantic segmentation. In: 2021 IEEE international conference on multimedia and expo. IEEE, pp 1\u20136","DOI":"10.1109\/ICME51207.2021.9428155"},{"key":"1210_CR25","doi-asserted-by":"crossref","unstructured":"Xing Y, Wang J, Zeng G (2020) Malleable 2.5 d convolution: learning receptive fields along the depth-axis for rgb-d scene parsing. In: European conference on computer vision, pp 555\u2013571","DOI":"10.1007\/978-3-030-58529-7_33"},{"key":"1210_CR26","doi-asserted-by":"crossref","unstructured":"Jiao J, Wei Y, Jie Z et al (2019) Geometry-aware distillation for indoor semantic segmentation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 2869\u20132878.","DOI":"10.1109\/CVPR.2019.00298"},{"key":"1210_CR27","doi-asserted-by":"crossref","unstructured":"Xiong Z, Yuan Y, Guo N et al (2020) Variational context-deformable convnets for indoor scene parsing. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 3992\u20134002","DOI":"10.1109\/CVPR42600.2020.00405"},{"key":"1210_CR28","unstructured":"Zhang Y, Yang Y, Xiong C et al (2022) Attention-based dual supervised decoder for RGBD semantic segmentation. arXiv:2201.01427"},{"key":"1210_CR29","doi-asserted-by":"crossref","unstructured":"Zhang C, Jiao J, Xu W et al. ADFNet: attention-based fusion network for few-shot RGB-D semantic segmentation. In: 2022 14th international conference on machine learning and computing (ICMLC), pp 91\u201396","DOI":"10.1145\/3529836.3529864"},{"key":"1210_CR30","unstructured":"Bai L, Yang J, Tian C et al (2022) DCANet: differential convolution attention network for RGB-D semantic segmentation. arXiv:2210.06747"},{"key":"1210_CR31","doi-asserted-by":"crossref","unstructured":"Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132\u20137141","DOI":"10.1109\/CVPR.2018.00745"},{"key":"1210_CR32","doi-asserted-by":"crossref","unstructured":"Yang C, Zhang L, Lu H et al (2013) Saliency detection via graph-based manifold ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3166\u20133173","DOI":"10.1109\/CVPR.2013.407"},{"key":"1210_CR33","doi-asserted-by":"crossref","unstructured":"Wang F, Jiang M, Qian C et al (2017) Residual attention network for image classification. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156\u20133164","DOI":"10.1109\/CVPR.2017.683"},{"key":"1210_CR34","unstructured":"Hu J, Shen L, Albanie S et al (2018) Gather-excite: exploiting feature context in convolutional neural networks. Advances in neural information processing systems, pp 9423\u20139433"},{"key":"1210_CR35","doi-asserted-by":"crossref","unstructured":"Woo S, Park J, Lee JY et al (2018) CBAM: convolutional block attention module. In: Proceedings of the European conference on computer vision, pp 3\u201319","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"1210_CR36","doi-asserted-by":"crossref","unstructured":"Fu J, Liu J, Tian H et al (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 3146\u20133154","DOI":"10.1109\/CVPR.2019.00326"},{"key":"1210_CR37","doi-asserted-by":"crossref","unstructured":"Xing H, Xiao Z, Zhan D, Luo S, Dai P, Li K (2022) SelfMatch: robust semisupervised time-series classification with self-distillation. Int J Intell Syst 1\u201328","DOI":"10.1002\/int.22957"},{"key":"1210_CR38","doi-asserted-by":"crossref","unstructured":"Xiao Z, Zhang H, Tong H, Xu X (2022) An efficient temporal network with dual self-distillation for electroencephalography signal classification. In: 2022 IEEE international conference on bioinformatics and biomedicine (BIBM), Las Vegas, pp 1759\u20131762","DOI":"10.1109\/BIBM55620.2022.9995049"},{"key":"1210_CR39","doi-asserted-by":"crossref","unstructured":"Silberman N, Hoiem D, Kohli P et al (2012) Indoor segmentation and support inference from rgbd images. In: European conference on computer vision, pp 746\u2013760","DOI":"10.1007\/978-3-642-33715-4_54"},{"key":"1210_CR40","doi-asserted-by":"crossref","unstructured":"Song S, Lichtenberg S P, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 567\u2013576","DOI":"10.1109\/CVPR.2015.7298655"},{"key":"1210_CR41","doi-asserted-by":"crossref","unstructured":"Janoch A, Karayev S, Jia Y et al (2013) A category-level 3d object dataset: putting the kinect to work. Consumer depth cameras for computer vision, pp 141\u2013165","DOI":"10.1007\/978-1-4471-4640-7_8"},{"key":"1210_CR42","doi-asserted-by":"crossref","unstructured":"Xiao J, Owens A, Torralba A (2013) Sun3d: a database of big spaces reconstructed using sfm and object labels. In: Proceedings of the IEEE international conference on computer vision, pp 1625\u20131632","DOI":"10.1109\/ICCV.2013.458"},{"issue":"3","key":"1210_CR43","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","volume":"115","author":"O Russakovsky","year":"2015","unstructured":"Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211\u2013252","journal-title":"Int J Comput Vis"},{"key":"1210_CR44","unstructured":"Mao A, Mehryar M, Yutao Z (2023) Cross-entropy loss functions: theoretical analysis and applications. arXiv:2304.07288"},{"key":"1210_CR45","doi-asserted-by":"crossref","unstructured":"Qi X, Liao R, Jia J et al (2017) 3d graph neural networks for RGBD semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 5199\u20135208","DOI":"10.1109\/ICCV.2017.556"},{"key":"1210_CR46","doi-asserted-by":"crossref","unstructured":"Lin D, Chen G, Cohen-Or D et al (2017) Cascaded feature network for semantic segmentation of RGB-D images. In: Proceedings of the IEEE international conference on computer vision, pp 1311\u20131319","DOI":"10.1109\/ICCV.2017.147"},{"key":"1210_CR47","doi-asserted-by":"crossref","unstructured":"Wang W, Neumann U (2018) Depth-aware CNN for RGB-D segmentation. In: Proceedings of the European conference on computer vision, pp 135\u2013150","DOI":"10.1007\/978-3-030-01252-6_9"},{"key":"1210_CR48","doi-asserted-by":"crossref","unstructured":"Kong S, Fowlkes CC (2018) Recurrent scene parsing with perspective understanding in the loop. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 956\u2013965","DOI":"10.1109\/CVPR.2018.00106"},{"key":"1210_CR49","doi-asserted-by":"crossref","unstructured":"Lin G, Milan A, Shen C et al (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925\u20131934","DOI":"10.1109\/CVPR.2017.549"},{"key":"1210_CR50","unstructured":"Jiang J, Zheng L, Luo F et al (2018) Rednet: residual encoder-decoder network for indoor RGB-D semantic segmentation. arXiv:1806.01054"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01210-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-01210-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01210-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,10]],"date-time":"2024-02-10T22:32:24Z","timestamp":1707604344000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-01210-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,25]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,2]]}},"alternative-id":["1210"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-01210-4","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"type":"print","value":"2199-4536"},{"type":"electronic","value":"2198-6053"}],"subject":[],"published":{"date-parts":[[2023,8,25]]},"assertion":[{"value":"10 October 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 August 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 August 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}