{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T19:00:32Z","timestamp":1771700432581,"version":"3.50.1"},"reference-count":86,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2024,4,12]],"date-time":"2024-04-12T00:00:00Z","timestamp":1712880000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Research and Development Planning in Key Areas of Guang dong Province","award":["2021B0202070001"],"award-info":[{"award-number":["2021B0202070001"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Leveraging data from various modalities to enhance multimodal segmentation tasks is a well-regarded approach. Recently, efforts have been made to incorporate an array of modalities, including depth and thermal imaging. Nevertheless, the effective amalgamation of cross-modal interactions remains a challenge, given the unique traits each modality presents. In our current research, we introduce the semantic guidance fusion network (SGFN), which is an innovative cross-modal fusion network adept at integrating a diverse set of modalities. Particularly, the SGFN features a semantic guidance module (SGM) engineered to boost bi-modal feature extraction. It encompasses a learnable semantic guidance convolution (SGC) designed to merge intensity and gradient data from disparate modalities. Comprehensive experiments carried out on the NYU Depth V2, SUN-RGBD, Cityscapes, MFNet, and ZJU datasets underscore both the superior performance and generalization ability of the SGFN compared to the current leading models. Moreover, when tested on the DELIVER dataset, the efficiency of our bi-modal SGFN displayed a mIoU that is comparable to the hitherto leading model, CMNEXT.<\/jats:p>","DOI":"10.3390\/s24082473","type":"journal-article","created":{"date-parts":[[2024,4,12]],"date-time":"2024-04-12T04:44:26Z","timestamp":1712897066000},"page":"2473","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Semantic Guidance Fusion Network for Cross-Modal Semantic Segmentation"],"prefix":"10.3390","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-9061-4805","authenticated-orcid":false,"given":"Pan","family":"Zhang","sequence":"first","affiliation":[{"name":"College of Information, Shanghai Ocean University, No. 999 Hucheng Ring Road, Shanghai 201306, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4393-6250","authenticated-orcid":false,"given":"Ming","family":"Chen","sequence":"additional","affiliation":[{"name":"College of Information, Shanghai Ocean University, No. 999 Hucheng Ring Road, Shanghai 201306, China"}]},{"given":"Meng","family":"Gao","sequence":"additional","affiliation":[{"name":"College of Information, Shanghai Ocean University, No. 999 Hucheng Ring Road, Shanghai 201306, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,4,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"4444","DOI":"10.1109\/TCSVT.2021.3121680","article-title":"Stage-Aware Feature Alignment Network for Real-Time Semantic Segmentation of Street Scenes","volume":"32","author":"Weng","year":"2022","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"7880","DOI":"10.1109\/TCSVT.2022.3187664","article-title":"UrbanLF: A Comprehensive Light Field Dataset for Semantic Segmentation of Urban Scenes","volume":"32","author":"Sheng","year":"2022","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2016, January 27\u201330). The Cityscapes Dataset for Semantic Urban Scene Understanding. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.350"},{"key":"ref_4","unstructured":"Brooks, F. (1999, January 13\u201317). What\u2019s Real About Virtual Reality?. Proceedings of the IEEE Virtual Reality (Cat. No. 99CB36316), Virtual."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1016\/j.neucom.2022.07.041","article-title":"GCNet: Grid-like context-aware network for RGB-thermal semantic segmentation","volume":"506","author":"Liu","year":"2022","journal-title":"Neurocomputing"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1481","DOI":"10.1109\/TCSVT.2023.3296162","article-title":"Pixel Difference Convolutional Network for RGB-D Semantic Segmentation","volume":"34","author":"Yang","year":"2023","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ha, Q., Watanabe, K., Karasawa, T., Ushiku, Y., and Harada, T. (2017, January 24\u201328). MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8206396"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Gupta, S., Girshick, R., Arbel\u00e1ez, P., and Malik, J. (2014, January 6\u201312). Learning rich features from RGB-D images for object detection and segmentation. Proceedings of the Computer Vision\u2013ECCV 2014: 13th European Conference, Zurich, Switzerland. Part VII.","DOI":"10.1007\/978-3-319-10584-0_23"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Cao, J., Leng, H., Lischinski, D., Cohen-Or, D., Tu, C., and Li, Y. (2021, January 11\u201317). ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00700"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wang, J., Wang, Z., Tao, D., See, S., and Wang, G. (2016, January 11\u201314). Learning common and specific features for RGB-D semantic segmentation with deconvolutional networks. Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands. Part V.","DOI":"10.1007\/978-3-319-46454-1_40"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Chen, X., Lin, K.Y., Wang, J., Wu, W., Qian, C., Li, H., and Zeng, G. (2020, January 23\u201328). Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58621-8_33"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Zhang, Q., Zhao, S., Luo, Y., Zhang, D., Huang, N., and Han, J. (2021, January 19\u201325). ABMDRNet: Adaptive-weighted Bi-directional Modality Difference Reduction Network for RGB-T Semantic Segmentation. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00266"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Hazirbas, C., Ma, L., Domokos, C., and Cremers, D. (2016, January 20\u201324). Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. Proceedings of the Computer Vision\u2013ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan. Revised Selected Papers, Part I.","DOI":"10.1007\/978-3-319-54181-5_14"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Hu, X., Yang, K., Fei, L., and Wang, K. (2019, January 22\u201325). ACNET: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation. Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan.","DOI":"10.1109\/ICIP.2019.8803025"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. (2012, January 7\u201313). Indoor segmentation and support inference from rgbd images. Proceedings of the 12th European Conference on Computer Vision\u2014ECCV 2012, Florence, Italy. Part V.","DOI":"10.1007\/978-3-642-33715-4_54"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"4802","DOI":"10.1364\/OE.416130","article-title":"Polarization-driven semantic segmentation via efficient attention-bridged fusion","volume":"29","author":"Xiang","year":"2021","journal-title":"Opt. Express"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zhang, J., Liu, R., Shi, H., Yang, K., Rei\u00df, S., Peng, K., Fu, H., Wang, K., and Stiefelhagen, R. (2023, January 17\u201324). Delivering Arbitrary-Modal Semantic Segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00116"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1239","DOI":"10.1007\/s11263-019-01188-y","article-title":"Self-supervised model adaptation for multimodal semantic segmentation","volume":"128","author":"Valada","year":"2020","journal-title":"Int. J. Comput. Vis."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Seichter, D., K\u00f6hler, M., Lewandowski, B., Wengefeld, T., and Gross, H.M. (June, January 30). Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis. Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi\u2019an, China.","DOI":"10.1109\/ICRA48506.2021.9561675"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhang, J., Yang, K., and Stiefelhagen, R. (October, January 27). ISSAFE: Improving Semantic Segmentation in Accidents by Fusing Event-based Data. Proceedings of the 2021 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic.","DOI":"10.1109\/IROS51168.2021.9636109"},{"key":"ref_21","first-page":"12077","article-title":"SegFormer: Simple and efficient design for semantic segmentation with transformers","volume":"34","author":"Xie","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the Medical Image Computing and Computer-Assisted Intervention\u2014MICCAI 2015: 18th International Conference, Munich, Germany. Part III.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid Scene Parsing Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_25","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_26","unstructured":"Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A.L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs","volume":"40","author":"Chen","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Takikawa, T., Acuna, D., Jampani, V., and Fidler, S. (November, January 27). Gated-SCNN: Gated Shape CNNs for Semantic Segmentation. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00533"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Ding, H., Jiang, X., Liu, A.Q., Thalmann, N.M., and Wang, G. (November, January 27). Boundary-Aware Feature Propagation for Scene Segmentation. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00692"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Yuan, Y., Xie, J., Chen, X., and Wang, J. (2020, January 23\u201328). Segfix: Model-agnostic boundary refinement for segmentation. Proceedings of the 16th European Conference Computer Vision\u2014ECCV 2020, Glasgow, UK. Part XII.","DOI":"10.1007\/978-3-030-58610-2_29"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 15\u201320). Dual Attention Network for Scene Segmentation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zhao, H., Zhang, Y., Liu, S., Shi, J., Loy, C.C., Lin, D., and Jia, J. (2018, January 8\u201314). PSANet: Point-wise Spatial Attention Network for Scene Parsing. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_17"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., and Liu, W. (2019, January 27\u201328). CCNet: Criss-Cross Attention for Semantic Segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00069"},{"key":"ref_35","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2021, January 3\u20137). An Image is Worth 16 \u00d7 16 Words: Transformers for Image Recognition at Scale. Proceedings of the International Conference on Learning Representations, Vienna, Austria."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Ranftl, R., Bochkovskiy, A., and Koltun, V. (2021, January 11\u201317). Vision Transformers for Dense Prediction. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01196"},{"key":"ref_37","first-page":"7281","article-title":"HRFormer: High-Resolution Vision Transformer for Dense Predict","volume":"Volume 34","author":"Ranzato","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Pang, B., and Lu, C. (2022, January 18\u201324). Semantic Segmentation by Early Region Proxy. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00132"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"He, H., Cai, J., Pan, Z., Liu, J., Zhang, J., Tao, D., and Zhuang, B. (2023, January 18\u201322). Dynamic Focus-aware Positional Queries for Semantic Segmentation. Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01087"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Gu, J., Kwon, H., Wang, D., Ye, W., Li, M., Chen, Y.H., Lai, L., Chandra, V., and Pan, D.Z. (2022, January 18\u201324). Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01178"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Shivakumar, S.S., Rodrigues, N., Zhou, A., Miller, I.D., Kumar, V., and Taylor, C.J. (August, January 31). PST900: RGB-Thermal Calibration, Dataset and Segmentation Network. Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France.","DOI":"10.1109\/ICRA40945.2020.9196831"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"108881","DOI":"10.1016\/j.patcog.2022.108881","article-title":"Complementarity-aware cross-modal feature fusion network for RGB-T semantic segmentation","volume":"131","author":"Wu","year":"2022","journal-title":"Pattern Recognit."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"7790","DOI":"10.1109\/TIP.2021.3109518","article-title":"GMNet: Graded-Feature Multilabel-Learning Network for RGB-Thermal Urban Scene Semantic Segmentation","volume":"30","author":"Zhou","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"2576","DOI":"10.1109\/LRA.2019.2904733","article-title":"RTFNet: RGB-Thermal Fusion Network for Semantic Segmentation of Urban Scenes","volume":"4","author":"Sun","year":"2019","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Kalra, A., Taamazyan, V., Rao, S.K., Venkataraman, K., Raskar, R., and Kadambi, A. (2020, January 13\u201319). Deep Polarization Cues for Transparent Object Segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00863"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Mei, H., Dong, B., Dong, W., Yang, J., Baek, S.H., Heide, F., Peers, P., Wei, X., and Yang, X. (2022, January 19\u201320). Glass Segmentation Using Intensity and Spectral Polarization Cues. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01229"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Alonso, I., and Murillo, A.C. (2019, January 16\u201317). EV-SegNet: Semantic segmentation for event-based cameras. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA.","DOI":"10.1109\/CVPRW.2019.00205"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"2606","DOI":"10.1109\/TITS.2021.3134828","article-title":"Exploring Event-Driven Dynamic Context for Accident Scene Segmentation","volume":"23","author":"Zhang","year":"2022","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"3737","DOI":"10.1109\/TCSVT.2023.3241641","article-title":"A Multi-Phase Camera-LiDAR Fusion Network for 3D Semantic Segmentation With Weak Supervision","volume":"33","author":"Chang","year":"2023","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"2068","DOI":"10.1109\/TCSVT.2021.3082763","article-title":"Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection for Autonomous Driving","volume":"32","author":"Yuan","year":"2022","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Rashed, H., Yogamani, S., El-Sallab, A., Krizek, P., and El-Helw, M. (2019). Optical flow augmented semantic segmentation networks for automated driving. arXiv.","DOI":"10.5220\/0007248300002108"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"14679","DOI":"10.1109\/TITS.2023.3300537","article-title":"CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers","volume":"24","author":"Zhang","year":"2023","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Juefei-Xu, F., Naresh Boddeti, V., and Savvides, M. (2017, January 21\u201326). Local binary convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.456"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Zhang, X., Liu, L., Xie, Y., Chen, J., Wu, L., and Pietikainen, M. (2017, January 22\u201329). Rotation invariant local binary convolution neural networks. Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy.","DOI":"10.1109\/ICCVW.2017.146"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Yu, Z., Qin, Y., Zhao, H., Li, X., and Zhao, G. (2021). Dual-cross central difference network for face anti-spoofing. arXiv.","DOI":"10.24963\/ijcai.2021\/177"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"3005","DOI":"10.1109\/TPAMI.2020.3036338","article-title":"NAS-FAS: Static-Dynamic Central Difference Network Search for Face Anti-Spoofing","volume":"43","author":"Yu","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Yu, Z., Zhao, C., Wang, Z., Qin, Y., Su, Z., Li, X., Zhou, F., and Zhao, G. (2020, January 13\u201319). Searching central difference convolutional networks for face anti-spoofing. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00534"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Su, Z., Liu, W., Yu, Z., Hu, D., Liao, Q., Tian, Q., Pietik\u00e4inen, M., and Liu, L. (2021, January 11\u201317). Pixel Difference Networks for Efficient Edge Detection. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00507"},{"key":"ref_59","first-page":"8702","article-title":"Semantic diffusion network for semantic segmentation","volume":"35","author":"Tan","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_60","unstructured":"Sapiro, G. (1995, January 23\u201326). Geometric partial differential equations in image analysis: Past, present, and future. Proceedings of the International Conference on Image Processing, Washington, DC, USA."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Deng, F., Feng, H., Liang, M., Wang, H., Yang, Y., Gao, Y., Chen, J., Hu, J., Guo, X., and Lam, T.L. (October, January 27). FEANet: Feature-Enhanced Attention Network for RGB-Thermal Real-time Semantic Segmentation. Proceedings of the 2021 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic.","DOI":"10.1109\/IROS51168.2021.9636084"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Fan, D.P., Song, K., Liang, D., Lu, T., Luo, P., and Shao, L. (2021, January 11\u201317). Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"ref_63","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 7\u20139). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_64","unstructured":"Glorot, X., Bordes, A., and Bengio, Y. (2011, January 11\u201313). Deep sparse rectifier neural networks. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics\u2014JMLR Workshop and Conference Proceedings, Fort Lauderdale, FL, USA."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Song, S., Lichtenberg, S.P., and Xiao, J. (2015, January 7\u201312). SUN RGB-D: A RGB-D scene understanding benchmark suite. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298655"},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Xiao, T., Liu, Y., Zhou, B., Jiang, Y., and Sun, J. (2018, January 8\u201314). Unified perceptual parsing for scene understanding. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01228-1_26"},{"key":"ref_68","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Gupta, S., Arbel\u00e1ez, P., and Malik, J. (2013, January 23\u201328). Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images. Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.79"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Qi, X., Liao, R., Jia, J., Fidler, S., and Urtasun, R. (2017, January 22\u201329). 3D Graph Neural Networks for RGBD Semantic Segmentation. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.556"},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Xu, D., Ouyang, W., Wang, X., and Sebe, N. (2018, January 18\u201323). Pad-net: Multi-tasks guided predictionand-distillation network for simultaneous depth estimation and scene parsing, in 2018 IEEE. Proceedings of the CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00077"},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Cui, Z., Xu, C., Yan, Y., Sebe, N., and Yang, J. (2019, January 15\u201320). Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00423"},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Yang, Y., Xu, Y., Zhang, C., Xu, Z., and Huang, J. (2022, January 25\u201327). Hierarchical Vision Transformer with Channel Attention for RGB-D Image Segmentation. Proceedings of the 4th International Symposium on Signal Processing Systems, Xi\u2019an, China.","DOI":"10.1145\/3532342.3532352"},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Wu, Z., Zhou, Z., Allibert, G., Stolz, C., Demonceaux, C., and Ma, C. (2022, October 18). Transformer Fusion for Indoor rgb-d Semantic Segmentation. SSRN. Available online: https:\/\/ssrn.com\/abstract=4251286.","DOI":"10.2139\/ssrn.4251286"},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"2313","DOI":"10.1109\/TIP.2021.3049332","article-title":"Spatial Information Guided Convolution for Real-Time RGBD Semantic Segmentation","volume":"30","author":"Chen","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_76","doi-asserted-by":"crossref","first-page":"658","DOI":"10.1109\/LSP.2021.3066071","article-title":"Non-Local Aggregation for RGB-D Semantic Segmentation","volume":"28","author":"Zhang","year":"2021","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_77","doi-asserted-by":"crossref","first-page":"108468","DOI":"10.1016\/j.patcog.2021.108468","article-title":"CANet: Co-attention network for RGB-D semantic segmentation","volume":"124","author":"Zhou","year":"2022","journal-title":"Pattern Recognit."},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Or\u0161ic, M., Kre\u0161o, I., Bevandic, P., and \u0160egvic, S. (2019, January 15\u201320). In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01289"},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"5558","DOI":"10.1109\/LRA.2020.3007457","article-title":"Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images","volume":"5","author":"Sun","year":"2020","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_80","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1109\/TITS.2017.2750080","article-title":"ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation","volume":"19","author":"Romera","year":"2018","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., and Cottrell, G. (2018, January 12\u201315). Understanding convolution for semantic segmentation. Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA.","DOI":"10.1109\/WACV.2018.00163"},{"key":"ref_82","doi-asserted-by":"crossref","first-page":"3349","DOI":"10.1109\/TPAMI.2020.2983686","article-title":"Deep High-Resolution Representation Learning for Visual Recognition","volume":"43","author":"Wang","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_83","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1016\/j.patrec.2021.03.015","article-title":"Attention fusion network for multi-spectral semantic segmentation","volume":"146","author":"Xu","year":"2021","journal-title":"Pattern Recognit. Lett."},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Yan, R., Yang, K., and Wang, K. (2021, January 27\u201331). NLFNet: Non-Local Fusion Towards Generalized Multimodal Semantic Segmentation across RGB-Depth, Polarization, and Thermal Images. Proceedings of the 2021 IEEE International Conference on Robotics and Biomimetics (ROBIO), Jinghong, China.","DOI":"10.1109\/ROBIO54168.2021.9739390"},{"key":"ref_85","doi-asserted-by":"crossref","unstructured":"Broedermann, T., Sakaridis, C., Dai, D., and Van Gool, L. (2022). HRFuser: A multi-resolution sensor fusion architecture for 2D object detection. arXiv.","DOI":"10.1109\/ITSC57777.2023.10422432"},{"key":"ref_86","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"Lecun","year":"1998","journal-title":"Proc. IEEE"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/8\/2473\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:26:53Z","timestamp":1760106413000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/8\/2473"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,12]]},"references-count":86,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2024,4]]}},"alternative-id":["s24082473"],"URL":"https:\/\/doi.org\/10.3390\/s24082473","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,12]]}}}