{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T10:49:28Z","timestamp":1775472568376,"version":"3.50.1"},"reference-count":50,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2023,3,11]],"date-time":"2023-03-11T00:00:00Z","timestamp":1678492800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of PR China","doi-asserted-by":"publisher","award":["42075130"],"award-info":[{"award-number":["42075130"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>The segmentation algorithm for buildings and waters is extremely important for the efficient planning and utilization of land resources. The temporal and space range of remote sensing pictures is growing. Due to the generic convolutional neural network\u2019s (CNN) insensitivity to the spatial position information in remote sensing images, certain location and edge details can be lost, leading to a low level of segmentation accuracy. This research suggests a double-branch parallel interactive network to address these issues, fully using the interactivity of global information in a Swin Transformer network, and integrating CNN to capture deeper information. Then, by building a cross-scale multi-level fusion module, the model can combine features gathered using convolutional neural networks with features derived using Swin Transformer, successfully extracting the semantic information of spatial information and context. Then, an up-sampling module for multi-scale fusion is suggested. It employs the output high-level feature information to direct the low-level feature information and recover the high-resolution pixel-level features. According to experimental results, the proposed networks maximizes the benefits of the two models and increases the precision of semantic segmentation of buildings and waters.<\/jats:p>","DOI":"10.3390\/rs15061536","type":"journal-article","created":{"date-parts":[[2023,3,13]],"date-time":"2023-03-13T03:03:57Z","timestamp":1678676637000},"page":"1536","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":55,"title":["Double Branch Parallel Network for Segmentation of Buildings and Waters in Remote Sensing Images"],"prefix":"10.3390","volume":"15","author":[{"given":"Jing","family":"Chen","sequence":"first","affiliation":[{"name":"Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, B-DAT, Nanjing University of Information Science and Technology, Nanjing 210044, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4681-9129","authenticated-orcid":false,"given":"Min","family":"Xia","sequence":"additional","affiliation":[{"name":"Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, B-DAT, Nanjing University of Information Science and Technology, Nanjing 210044, China"}]},{"given":"Dehao","family":"Wang","sequence":"additional","affiliation":[{"name":"Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, B-DAT, Nanjing University of Information Science and Technology, Nanjing 210044, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3835-6075","authenticated-orcid":false,"given":"Haifeng","family":"Lin","sequence":"additional","affiliation":[{"name":"College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,3,11]]},"reference":[{"key":"ref_1","first-page":"102940","article-title":"DPCC-Net: Dual-perspective change contextual network for change detection in high-resolution remote sensing images","volume":"112","author":"Shu","year":"2022","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1109\/JSTARS.2022.3224081","article-title":"Axial Cross Attention Meets CNN: Bi-Branch Fusion Network for Change Detection","volume":"16","author":"Song","year":"2023","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_3","first-page":"103103","article-title":"WaterHRNet: A multibranch hierarchical attentive network for water body extraction with remote sensing images","volume":"115","author":"Yu","year":"2022","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"104940","DOI":"10.1016\/j.cageo.2021.104940","article-title":"Strip pooling channel spatial attention network for the segmentation of cloud and cloud shadow","volume":"157","author":"Qu","year":"2021","journal-title":"Comput. Geosci."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"6149","DOI":"10.1007\/s00521-021-06802-0","article-title":"Multi-scale strip pooling feature aggregation network for cloud and cloud shadow segmentation","volume":"34","author":"Lu","year":"2022","journal-title":"Neural Comput. Appl."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Hu, K., Li, M., Xia, M., and Lin, H. (2022). Multi-Scale Feature Aggregation Network for Water Area Segmentation. Remote Sens., 14.","DOI":"10.3390\/rs14010206"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"876065","DOI":"10.3389\/fnins.2022.876065","article-title":"O-Net: A novel framework with deep fusion of CNN and transformer for simultaneous segmentation and classification","volume":"16","author":"Wang","year":"2022","journal-title":"Front. Neurosci."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"5917","DOI":"10.1080\/01431161.2021.2022805","article-title":"SGBNet: An Ultra Light-weight Network for Real-time Semantic Segmentation of Land Cover","volume":"43","author":"Pang","year":"2022","journal-title":"Int. J. Remote Sens."},{"key":"ref_9","first-page":"102881","article-title":"Semi-supervised semantic segmentation framework with pseudo supervisions for land-use\/land-cover mapping in coastal areas","volume":"112","author":"Chen","year":"2022","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"5940","DOI":"10.1080\/01431161.2021.2014077","article-title":"Cloud\/shadow segmentation based on multi-level feature enhanced network for remote sensing imagery","volume":"43","author":"Miao","year":"2022","journal-title":"Int. J. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"3155","DOI":"10.1109\/TPWRD.2021.3124528","article-title":"Parameter Identification in Power Transmission Systems Based on Graph Convolution Network","volume":"37","author":"Wang","year":"2022","journal-title":"IEEE Trans. Power Deliv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Chen, B., Xia, M., and Huang, J. (2021). Mfanet: A multi-level feature aggregation network for semantic segmentation of land cover. Remote Sens., 13.","DOI":"10.3390\/rs13040731"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ma, Z., Xia, M., Weng, L., and Lin, H. (2023). Local Feature Search Network for Building and Water Segmentation of Remote Sensing Image. Sustainability, 15.","DOI":"10.3390\/su15043034"},{"key":"ref_14","first-page":"1","article-title":"Semi-supervised locality preserving dense graph neural network with ARMA filters and context-aware learning for hyperspectral image classification","volume":"60","author":"Ding","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"119508","DOI":"10.1016\/j.eswa.2023.119508","article-title":"Multireceptive field: An adaptive path aggregation graph neural framework for hyperspectral image classification","volume":"217","author":"Zhang","year":"2023","journal-title":"Expert Syst. Appl."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1016\/j.neucom.2022.06.031","article-title":"Multi-feature fusion: Graph neural network and CNN combining for hyperspectral image classification","volume":"501","author":"Ding","year":"2022","journal-title":"Neurocomputing"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"5874","DOI":"10.1080\/01431161.2022.2073795","article-title":"MANet: A multi-level aggregation network for semantic segmentation of high-resolution remote sensing images","volume":"43","author":"Chen","year":"2022","journal-title":"Int. J. Remote Sens."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Hu, K., Weng, C., Zhang, Y., Jin, J., and Xia, Q. (2022). An Overview of Underwater Vision Enhancement: From Traditional Methods to Recent Deep Learning. J. Mar. Sci. Eng., 10.","DOI":"10.3390\/jmse10020241"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Hu, K., Ding, Y., Jin, J., Weng, L., and Xia, M. (2022). Skeleton Motion Recognition Based on Multi-Scale Deep Spatio-Temporal Features. Appl. Sci., 12.","DOI":"10.3390\/app12031028"},{"key":"ref_20","first-page":"1","article-title":"Self-supervised locality preserving low-pass graph convolutional embedding for large-scale hyperspectral image clustering","volume":"60","author":"Ding","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_22","first-page":"1","article-title":"Dual-branch Network for Cloud and Cloud Shadow Segmentation","volume":"60","author":"Lu","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Lin, G., Milan, A., Shen, C., and Reid, I. (2017, January 21\u201326). Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.549"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Sun, K., Xiao, B., Liu, D., and Wang, J. (2019, January 15\u201320). Deep high-resolution representation learning for human pose estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00584"},{"key":"ref_28","first-page":"6000","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_29","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_30","first-page":"9355","article-title":"Twins: Revisiting the design of spatial attention in vision transformers","volume":"34","author":"Chu","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_32","unstructured":"Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., and Wang, M. (2021). Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"016513","DOI":"10.1117\/1.JRS.16.016513","article-title":"MLNet: Multichannel feature fusion lozenge network for land segmentation","volume":"16","author":"Gao","year":"2022","journal-title":"J. Appl. Remote Sens."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_35","unstructured":"Li, G., Yun, I., Kim, J., and Kim, J. (2019). Dabnet: Depth-wise asymmetric bottleneck for real-time semantic segmentation. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Ma, N., Zhang, X., Zheng, H.T., and Sun, J. (2018, January 8\u201314). Shufflenet v2: Practical guidelines for efficient cnn architecture design. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_8"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"3051","DOI":"10.1007\/s11263-021-01515-2","article-title":"Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation","volume":"129","author":"Yu","year":"2021","journal-title":"Int. J. Comput. Vis."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Fang, L., Liu, J., Liu, J., and Mao, R. (2018, January 2\u20134). Automatic segmentation and 3d reconstruction of spine based on fcn and marching cubes in ct volumes. Proceedings of the 2018 10th International Conference on Modelling, Identification and Control (ICMIC), Guiyang, China.","DOI":"10.1109\/ICMIC.2018.8529993"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201322). Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Jiang, W., Wu, Y., Guan, L., and Zhao, J. (2019, January 20\u201324). Dfnet: Semantic segmentation on panoramic images with dynamic loss weights and residual fusion block. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8794476"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 16\u201320). Dual attention network for scene segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Peng, Z., Huang, W., Gu, S., Xie, L., Wang, Y., Jiao, J., and Ye, Q. (2021, January 10\u201317). Conformer: Local features coupling global representations for visual recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00042"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Mehta, S., Rastegari, M., Shapiro, L., and Hajishirzi, H. (2019, January 16\u201320). Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00941"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., and Xu, C. (2020, January 14\u201319). Ghostnet: More features from cheap operations. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00165"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., and Sang, N. (2018, January 18\u201322). Learning a discriminative feature network for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00199"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., and Zhang, L. (2021, January 10\u201317). Cvt: Introducing convolutions to vision transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00009"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Fan, D.P., Song, K., Liang, D., Lu, T., Luo, P., and Shao, L. (2021, January 10\u201317). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Yuan, Y., Chen, X., Chen, X., and Wang, J. (2019). Segmentation transformer: Object-contextual representations for semantic segmentation. arXiv.","DOI":"10.1007\/978-3-030-58539-6_11"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/6\/1536\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:52:43Z","timestamp":1760122363000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/6\/1536"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,11]]},"references-count":50,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["rs15061536"],"URL":"https:\/\/doi.org\/10.3390\/rs15061536","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,11]]}}}