{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T11:11:16Z","timestamp":1776424276398,"version":"3.51.2"},"reference-count":115,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2023,6,1]],"date-time":"2023-06-01T00:00:00Z","timestamp":1685577600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Key Research and Development Project","award":["2018YFE0206500"],"award-info":[{"award-number":["2018YFE0206500"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>Semantic segmentation is a critical task in computer vision that aims to assign each pixel in an image a corresponding label on the basis of its semantic content. This task is commonly referred to as dense labeling because it requires pixel-level classification of the image. The research area of semantic segmentation is vast and has achieved critical advances in recent years. Deep learning architectures in particular have shown remarkable performance in generating high-level, hierarchical, and semantic features from images. Among these architectures, convolutional neural networks have been widely used to address semantic segmentation problems. This work aims to review and analyze recent technological developments in image semantic segmentation. It provides an overview of traditional and deep-learning-based approaches and analyzes their structural characteristics, strengths, and limitations. Specifically, it focuses on technical developments in deep-learning-based 2D semantic segmentation methods proposed over the past decade and discusses current challenges in semantic segmentation. The future development direction of semantic segmentation and the potential research areas that need further exploration are also examined.<\/jats:p>","DOI":"10.3390\/fi15060205","type":"journal-article","created":{"date-parts":[[2023,6,2]],"date-time":"2023-06-02T01:33:54Z","timestamp":1685669634000},"page":"205","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["2D Semantic Segmentation: Recent Developments and Future Directions"],"prefix":"10.3390","volume":"15","author":[{"given":"Yu","family":"Guo","sequence":"first","affiliation":[{"name":"Wuhan University GNSS Research Center, Wuhan University, Wuhan 430079, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1271-7968","authenticated-orcid":false,"given":"Guigen","family":"Nie","sequence":"additional","affiliation":[{"name":"Wuhan University GNSS Research Center, Wuhan University, Wuhan 430079, China"},{"name":"Hubei Luojia Laboratory, Wuhan 430079, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wenliang","family":"Gao","sequence":"additional","affiliation":[{"name":"Wuhan University GNSS Research Center, Wuhan University, Wuhan 430079, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mi","family":"Liao","sequence":"additional","affiliation":[{"name":"Wuhan University GNSS Research Center, Wuhan University, Wuhan 430079, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,6,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/0600000079","article-title":"Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art","volume":"12","author":"Janai","year":"2020","journal-title":"Found. Trends\u00ae Comput. Graph. Vis."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Lu, X., Wang, W., Ma, C., Shen, J., Shao, L., and Porikli, F. (2019, January 15\u201320). See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00374"},{"key":"ref_3","first-page":"2228","article-title":"Zero-Shot Video Object Segmentation with Co-Attention Siamese Networks","volume":"44","author":"Lu","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Lin, G., Milan, A., Shen, C., and Reid, I. (2017, January 21\u201326). Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.549"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Noh, H., Hong, S., and Han, B. (2015, January 11\u201318). Learning deconvolution network for semantic segmentation. Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.178"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Wei, Z., Sun, Y., Wang, J., Lai, H., and Liu, S. (2017, January 21\u201326). Learning adaptive receptive fields for deep image parsing network. Proceedings of the 2017 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.420"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Batra, A., Singh, S., Pang, G., Basu, S., Jawahar, C.V., and Paluri, M. (2019, January 16\u201320). Improved road connectivity by joint learning of orientation and segmentation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01063"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Farha, Y.A., and Gall, J. (2019, January 16\u201320). Ms-tcn: Multi-stage temporal convolutional network for action segmentation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00369"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"107155","DOI":"10.1016\/j.compeleceng.2021.107155","article-title":"Multi-feature fusion network for road scene semantic segmentation","volume":"92","author":"Sun","year":"2021","journal-title":"Comput. Electr. Eng."},{"key":"ref_10","first-page":"36","article-title":"Review on semantic segmentation of road scenes","volume":"58","author":"Yanc","year":"2021","journal-title":"Laser Optoelectron. Prog."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1016\/j.neucom.2021.08.105","article-title":"Lane-deeplab: Lane semantic segmentation in automatic driving scenarios for high-definition maps","volume":"465","author":"Li","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"3145","DOI":"10.1007\/s13042-019-01005-5","article-title":"SegFast-V2: Semantic image segmentation with less parameters in deep learning for autonomous driving","volume":"10","author":"Ghosh","year":"2019","journal-title":"Int. J. Mach. Learn. Cybern."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Mao, J., Xiao, T., Jiang, Y., and Cao, Z. (2017, January 21\u201326). What can help pedestrian detection?. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.639"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"108063","DOI":"10.1016\/j.patcog.2021.108063","article-title":"Weak segmentation supervised deep neural networks for pedestrian detection","volume":"119","author":"Guo","year":"2021","journal-title":"Pattern Recognit."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Kampffmeyer, M., Salberg, A.B., and Jenssen, R. (July, January 26). Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, VA, USA.","DOI":"10.1109\/CVPRW.2016.90"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Ouyang, S., and Li, Y. (2020). Combining deep semantic segmentation network and graph convolutional neural network for semantic segmentation of remote sensing imagery. Remote Sens., 13.","DOI":"10.3390\/rs13010119"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Peng, S., Liu, Y., Huang, Q., Zhou, X., and Bao, H. (2019, January 16\u201320). Pvnet: Pixel-wise voting network for 6dof pose estimation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00469"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1584","DOI":"10.1109\/LSP.2022.3186594","article-title":"Segmentation-Based Background-Inference and Small-Person Pose Estimation","volume":"29","author":"Gao","year":"2022","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"749","DOI":"10.1007\/s00371-021-02075-9","article-title":"Contour-aware semantic segmentation network with spatial attention mechanism for medical image","volume":"38","author":"Cheng","year":"2022","journal-title":"Vis. Comput."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1007\/s10462-020-09854-1","article-title":"Deep semantic segmentation of natural and medical images: A review","volume":"54","author":"Abhishek","year":"2021","journal-title":"Artif. Intell. Rev."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"638182","DOI":"10.3389\/fonc.2021.638182","article-title":"Artificial convolutional neural network in object detection and semantic segmentation for medical imaging analysis","volume":"11","author":"Yang","year":"2021","journal-title":"Front. Oncol."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1007\/s10916-018-1116-1","article-title":"Deep semantic segmentation of kidney and space-occupying lesion area based on SCNN and ResNet models combined with SIFT-flow algorithm","volume":"43","author":"Xia","year":"2019","journal-title":"J. Med. Syst."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_24","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"Imagenet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Liu, F., Shen, C., and Lin, G. (2015, January 7\u201312). Deep convolutional neural fields for depth estimation from a single image. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299152"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Xu, D., Ricci, E., Ouyang, W., Wang, X., and Sebe, N. (2017, January 21\u201326). Multi-scale continuous crfs as sequential deep networks for monocular depth estimation. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.25"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 11\u201318). Fully convolutional networks for semantic segmentation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Santiago, Chile.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_32","unstructured":"Paszke, A., Chaurasia, A., Kim, S., and Culurciello, E. (2016). Enet: A deep neural network architecture for real-time semantic segmentation. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Li, Y., Qi, H., Dai, J., Ji, X., and Wei, Y. (2017, January 21\u201326). Fully convolutional instance-aware semantic segmentation. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.472"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201323). Path aggregation network for instance segmentation. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Bolya, D., Zhou, C., Xiao, F., and Lee, Y.J. (2019, January 15\u201320). Yolact: Real-time instance segmentation. Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/ICCV.2019.00925"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Wang, X., Kong, T., Shen, C., Jiang, Y., and Li, L. (2020, January 23\u201328). Solo: Segmenting objects by locations. Proceedings of the 2020 European Conference, Glasgow, UK.","DOI":"10.1007\/978-3-030-58523-5_38"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"2375","DOI":"10.1007\/s11263-021-01465-9","article-title":"OCNet: Object context for semantic segmentation","volume":"129","author":"Yuan","year":"2021","journal-title":"Int. J. Comput. Vis."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the 2015 International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_39","unstructured":"Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., and Liang, J. (2018). Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Springer."},{"key":"ref_40","unstructured":"Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A.L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_42","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"12","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_45","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 2015 International Conference on Machine Learning, Lille, France."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"108290","DOI":"10.1016\/j.patcog.2021.108290","article-title":"Contextual ensemble network for semantic segmentation","volume":"122","author":"Zhou","year":"2022","journal-title":"Pattern Recognit."},{"key":"ref_48","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Fan, H., Xiong, B., Mangalam, K., Li, Y., Yan, Z., Malik, J., and Feichtenhofer, C. (2021, January 11\u201317). Multiscale vision transformers. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00675"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Jiang, Z.H., Tay, F.E., Feng, J., and Yan, S. (2021, January 11\u201317). Tokens-to-token vit: Training vision transformers from scratch on imagenet. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00060"},{"key":"ref_51","first-page":"15908","article-title":"Transformer in transformer","volume":"34","author":"Han","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_52","first-page":"12992","article-title":"Glance-and-gaze vision transformer","volume":"34","author":"Yu","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Tian, Z., He, T., Shen, C., and Yan, Y. (2019, January 15\u201320). Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00324"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Jiao, J., Wei, Y., Jie, Z., Shi, H., Lau, R.W., and Huang, T.S. (2019, January 15\u201320). Geometry-aware distillation for indoor semantic segmentation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00298"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial pyramid pooling in deep convolutional networks for visual recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_56","unstructured":"Yu, F., and Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"114532","DOI":"10.1016\/j.eswa.2020.114532","article-title":"Optimized HRNet for image semantic segmentation","volume":"174","author":"Wu","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Kim, D.S., Kim, Y.H., and Park, K.R. (2021). Semantic segmentation by multi-scale feature extraction based on grouped dilated convolution module. Mathematics, 9.","DOI":"10.3390\/math9090947"},{"key":"ref_59","unstructured":"Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. arXiv."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z., and Wang, J. (2019, January 15\u201320). Structured knowledge distillation for semantic segmentation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00271"},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1016\/j.neucom.2021.01.086","article-title":"Real-time semantic segmentation via sequential knowledge distillation","volume":"439","author":"Wu","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"119049","DOI":"10.1109\/ACCESS.2021.3107841","article-title":"Robust Semantic Segmentation with Multi-Teacher Knowledge Distillation","volume":"9","author":"Amirkhani","year":"2021","journal-title":"IEEE Access"},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"5363","DOI":"10.1109\/TIP.2021.3083113","article-title":"Double similarity distillation for semantic image segmentation","volume":"30","author":"Feng","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/j.neucom.2018.05.083","article-title":"Deep visual domain adaptation: A survey","volume":"312","author":"Wang","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"105444","DOI":"10.1016\/j.knosys.2019.105444","article-title":"Knowledge based domain adaptation for semantic segmentation","volume":"193","author":"Zhang","year":"2020","journal-title":"Knowl.-Based Syst."},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"3798","DOI":"10.1109\/TCSVT.2021.3116210","article-title":"Partial domain adaptation on semantic segmentation","volume":"32","author":"Tian","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1093\/nsr\/nwx106","article-title":"A brief introduction to weakly supervised learning","volume":"5","author":"Zhou","year":"2018","journal-title":"Natl. Sci. Rev."},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"4037","DOI":"10.1109\/TPAMI.2020.2992393","article-title":"Self-supervised visual feature learning with deep neural networks: A survey","volume":"43","author":"Jing","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"888","DOI":"10.1109\/34.868688","article-title":"Normalized cuts and image segmentation","volume":"22","author":"Shi","year":"2000","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"978","DOI":"10.1016\/j.jvcir.2014.02.015","article-title":"Moving cast shadow detection using online sub-scene shadow modeling and object inner-edges analysis","volume":"25","author":"Wang","year":"2014","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"103306","DOI":"10.1016\/j.jvcir.2021.103306","article-title":"Visible and thermal images fusion architecture for few-shot semantic segmentation","volume":"80","author":"Bao","year":"2021","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_72","unstructured":"Bucher, M., Vu, T.H., Cord, M., and P\u00e9rez, P. (2019). Zero-shot semantic segmentation. Adv. Neural Inf. Process. Syst., 32."},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Gu, Z., Zhou, S., Niu, L., Zhao, Z., and Zhang, L. (2020, January 12\u201316). Context-aware feature generation for zero-shot semantic segmentation. Proceedings of the 2020 28th ACM International Conference on Multimedia (MM), Seattle, WA, USA.","DOI":"10.1145\/3394171.3413593"},{"key":"ref_74","unstructured":"Li, B., Weinberger, K.Q., Belongie, S., Koltun, V., and Ranftl, R. (2022). Language-driven semantic segmentation. arXiv."},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Zhang, H., and Ding, H. (2021, January 11\u201317). Prototypical matching and open set rejection for zero-shot semantic segmentation. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00689"},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Xu, J., De Mello, S., Liu, S., Byeon, W., Breuel, T., Kautz, J., and Wang, X. (2022, January 18\u201324). GroupViT: Semantic Segmentation Emerges from Text Supervision. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01760"},{"key":"ref_77","doi-asserted-by":"crossref","first-page":"1757","DOI":"10.1109\/TPAMI.2012.256","article-title":"Toward open set recognition","volume":"35","author":"Scheirer","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Pastore, G., Cermelli, F., Xian, Y., Mancini, M., Akata, Z., and Caputo, B. (2021, January 19\u201325). A closer look at self-training for zero-label semantic segmentation. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual.","DOI":"10.1109\/CVPRW53098.2021.00303"},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"5443","DOI":"10.1007\/s11042-021-11792-1","article-title":"Dual semantic-guided model for weakly-supervised zero-shot semantic segmentation","volume":"81","author":"Shen","year":"2022","journal-title":"Multimed. Tools Appl."},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Gu, Z., Zhou, S., Niu, L., Zhao, Z., and Zhang, L. (2022). From pixel to patch: Synthesize context-aware features for zero-shot semantic segmentation. IEEE Trans. Neural Netw. Learn. Syst., 1\u201315.","DOI":"10.1109\/TNNLS.2022.3145962"},{"key":"ref_81","doi-asserted-by":"crossref","first-page":"1043","DOI":"10.1109\/TMI.2021.3131245","article-title":"Domain Adaptation Meets Zero-Shot Learning: An Annotation-Efficient Approach to Multi-Modality Medical Image Segmentation","volume":"41","author":"Bian","year":"2021","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_82","first-page":"14","article-title":"Attention Mechanism in Neural Networks","volume":"6","author":"Kosiorek","year":"2017","journal-title":"Robot. Ind."},{"key":"ref_83","doi-asserted-by":"crossref","unstructured":"Lambert, J., Liu, Z., Sener, O., Hays, J., and Koltun, V. (2020, January 14\u201319). MSeg: A composite dataset for multi-domain semantic segmentation. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00295"},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201323). Non-local neural networks. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_85","doi-asserted-by":"crossref","unstructured":"Zhu, X., Cheng, D., Zhang, Z., Lin, S., and Dai, J. (November, January 27). An empirical study of spatial attention mechanisms in deep networks. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, South Korea.","DOI":"10.1109\/ICCV.2019.00679"},{"key":"ref_86","unstructured":"Li, H., Xiong, P., An, J., and Wang, L. (2018). Pyramid attention network for semantic segmentation. arXiv."},{"key":"ref_87","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 15\u201320). Dual attention network for scene segmentation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_88","doi-asserted-by":"crossref","first-page":"106370","DOI":"10.1016\/j.compag.2021.106370","article-title":"Semantic segmentation model of cotton roots in-situ image based on attention mechanism","volume":"189","author":"Kang","year":"2021","journal-title":"Comput. Electron. Agric."},{"key":"ref_89","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1007\/s13042-022-01517-7","article-title":"A hybrid-attention semantic segmentation network for remote sensing interpretation in land-use surveillance","volume":"14","author":"Lv","year":"2022","journal-title":"Int. J. Mach. Learn. Cybern."},{"key":"ref_90","doi-asserted-by":"crossref","first-page":"2010","DOI":"10.1109\/TPAMI.2015.2505311","article-title":"Joint feature selection and subspace learning for cross-modal retrieval","volume":"38","author":"Wang","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_91","doi-asserted-by":"crossref","unstructured":"Yang, M., Rosenhahn, B., and Murino, V. (2019). Multimodal Scene Understanding: Algorithms, Applications and Deep Learning, Academic Press.","DOI":"10.1016\/B978-0-12-817358-9.00007-X"},{"key":"ref_92","doi-asserted-by":"crossref","first-page":"104042","DOI":"10.1016\/j.imavis.2020.104042","article-title":"Deep multimodal fusion for semantic image segmentation: A survey","volume":"105","author":"Zhang","year":"2021","journal-title":"Image Vis. Comput."},{"key":"ref_93","doi-asserted-by":"crossref","first-page":"1224","DOI":"10.1109\/TCSVT.2021.3077058","article-title":"ECFFNet: Effective and consistent feature fusion network for RGB-T salient object detection","volume":"32","author":"Zhou","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_94","doi-asserted-by":"crossref","unstructured":"Patel, N., Choromanska, A., Krishnamurthy, P., and Khorrami, F. (2017, January 24\u201328). Sensor modality fusion with CNNs for UGV autonomous driving in indoor environments. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8205958"},{"key":"ref_95","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1016\/j.inffus.2021.10.008","article-title":"A novel multimodal fusion network based on a joint coding model for lane line segmentation","volume":"80","author":"Zou","year":"2022","journal-title":"Inf. Fusion"},{"key":"ref_96","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1016\/j.patrec.2008.04.005","article-title":"Semantic object classes in video: A high-definition ground truth database","volume":"30","author":"Brostow","year":"2009","journal-title":"Pattern Recognit. Lett."},{"key":"ref_97","unstructured":"Larsson, M., Stenborg, E., Hammarstrand, L., Pollefeys, M., Sattler, T., and Kahl, F. (2019, January 15\u201320). A cross-season correspondence dataset for robust semantic segmentation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA."},{"key":"ref_98","doi-asserted-by":"crossref","unstructured":"Orsic, M., Kreso, I., Bevandic, P., and Segvic, S. (2019, January 15\u201320). In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01289"},{"key":"ref_99","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the 2014 European Conference on Computer Vision (ECCV), Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_100","doi-asserted-by":"crossref","unstructured":"Hu, Y.T., Chen, H.S., Hui, K., Huang, J.B., and Schwing, A.G. (2019, January 16\u201320). Sail-vos: Semantic amodal instance level video object segmentation-a synthetic dataset and baselines. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00322"},{"key":"ref_101","doi-asserted-by":"crossref","unstructured":"Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2016, January 27\u201330). The cityscapes dataset for semantic urban scene understanding. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR.2016.350"},{"key":"ref_102","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1007\/s11263-014-0733-5","article-title":"The pascal visual object classes challenge: A retrospective","volume":"111","author":"Everingham","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_103","doi-asserted-by":"crossref","unstructured":"Hariharan, B., Arbel\u00e1ez, P., Bourdev, L., Maji, S., and Malik, J. (2011, January 6-13). Semantic contours from inverse detectors. Proceedings of the 2011 International Conference on Computer Vision (ICCV), Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126343"},{"key":"ref_104","doi-asserted-by":"crossref","unstructured":"Brostow, G.J., Shotton, J., Fauqueur, J., and Cipolla, R. (2008, January 12\u201318). Segmentation and recognition using structure from motion point clouds. Proceedings of the 2008 European Conference on Computer Vision (ECCV), Berlin, Germany.","DOI":"10.1007\/978-3-540-88682-2_5"},{"key":"ref_105","doi-asserted-by":"crossref","unstructured":"Miao, J., Wei, Y., Wu, Y., Liang, C., Li, G., and Yang, Y. (2021, January 19\u201325). Vspw: A large-scale dataset for video scene parsing in the wild. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Electr Network, Virtual.","DOI":"10.1109\/CVPR46437.2021.00412"},{"key":"ref_106","doi-asserted-by":"crossref","first-page":"501","DOI":"10.1109\/TMI.2004.825627","article-title":"Ridge-based vessel segmentation in color images of the retina","volume":"23","author":"Staal","year":"2004","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_107","doi-asserted-by":"crossref","first-page":"1993","DOI":"10.1109\/TMI.2014.2377694","article-title":"The multimodal brain tumor image segmentation benchmark (BRATS)","volume":"34","author":"Menze","year":"2014","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_108","doi-asserted-by":"crossref","first-page":"2868","DOI":"10.1109\/JSTARS.2016.2582921","article-title":"Semantic labeling of aerial and satellite imagery","volume":"9","author":"Paisitkriangkrai","year":"2016","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_109","doi-asserted-by":"crossref","unstructured":"Maggiori, E., Tarabalka, Y., Charpiat, G., and Alliez, P. (2017, January 23\u201328). Can semantic labeling methods generalize to any city? The inria aerial image labeling benchmark. Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA.","DOI":"10.1109\/IGARSS.2017.8127684"},{"key":"ref_110","doi-asserted-by":"crossref","unstructured":"Miao, L., and Zhang, Y. (2021). A hierarchical feature extraction network for fast scene segmentation. Sensors, 21.","DOI":"10.3390\/s21227730"},{"key":"ref_111","doi-asserted-by":"crossref","unstructured":"Huang, S., Lu, Z., Cheng, R., and He, C. (2021, January 10\u201317). Fapn: Feature-aligned pyramid network for dense image prediction. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00090"},{"key":"ref_112","unstructured":"Hong, Y., Pan, H., Sun, W., Member, S., and Jia, Y. (2021). Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv."},{"key":"ref_113","doi-asserted-by":"crossref","first-page":"5617","DOI":"10.1002\/int.22804","article-title":"Mifnet: A lightweight multiscale information fusion network","volume":"37","author":"Cheng","year":"2021","journal-title":"Int. J. Intell. Syst."},{"key":"ref_114","unstructured":"Chen, Z., Duan, Y., Wang, W., He, J., Lu, T., Dai, J., and Qiao, Y. (2022). Vision transformer adapter for dense predictions. arXiv."},{"key":"ref_115","doi-asserted-by":"crossref","first-page":"2547","DOI":"10.1109\/TNNLS.2020.3006524","article-title":"Scene segmentation with dual relation-aware attention network","volume":"32","author":"Fu","year":"2020","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/15\/6\/205\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:47:27Z","timestamp":1760125647000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/15\/6\/205"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,1]]},"references-count":115,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2023,6]]}},"alternative-id":["fi15060205"],"URL":"https:\/\/doi.org\/10.3390\/fi15060205","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,1]]}}}