{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T15:39:17Z","timestamp":1761061157812,"version":"build-2065373602"},"reference-count":49,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2020,9,29]],"date-time":"2020-09-29T00:00:00Z","timestamp":1601337600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IJGI"],"abstract":"<jats:p>Semantic segmentation plays an important role in being able to understand the content of remote sensing images. In recent years, deep learning methods based on Fully Convolutional Networks (FCNs) have proved to be effective for the sematic segmentation of remote sensing images. However, the rich information and complex content makes the training of networks for segmentation challenging, and the datasets are necessarily constrained. In this paper, we propose a Convolutional Neural Network (CNN) model called Dual Path Attention Network (DPA-Net) that has a simple modular structure and can be added to any segmentation model to enhance its ability to learn features. Two types of attention module are appended to the segmentation model, one focusing on spatial information the other focusing upon the channel. Then, the outputs of these two attention modules are fused to further improve the network\u2019s ability to extract features, thus contributing to more precise segmentation results. Finally, data pre-processing and augmentation strategies are used to compensate for the small number of datasets and uneven distribution. The proposed network was tested on the Gaofen Image Dataset (GID). The results show that the network outperformed U-Net, PSP-Net, and DeepLab V3+ in terms of the mean IoU by 0.84%, 2.54%, and 1.32%, respectively.<\/jats:p>","DOI":"10.3390\/ijgi9100571","type":"journal-article","created":{"date-parts":[[2020,9,29]],"date-time":"2020-09-29T20:56:22Z","timestamp":1601412982000},"page":"571","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":25,"title":["Dual Path Attention Net for Remote Sensing Semantic Image Segmentation"],"prefix":"10.3390","volume":"9","author":[{"given":"Jinglun","family":"Li","sequence":"first","affiliation":[{"name":"School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100096, China"}]},{"given":"Jiapeng","family":"Xiu","sequence":"additional","affiliation":[{"name":"School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100096, China"}]},{"given":"Zhengqiu","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100096, China"}]},{"given":"Chen","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100096, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,9,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1343","DOI":"10.1080\/01431161.2017.1399472","article-title":"Visual descriptors for content-based retrieval of remote-sensing images","volume":"39","author":"Napoletano","year":"2017","journal-title":"Int. J. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"818","DOI":"10.1109\/TGRS.2012.2205158","article-title":"Geographic Image Retrieval Using Local Invariant Features","volume":"51","author":"Yang","year":"2012","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"474","DOI":"10.1109\/LGRS.2018.2795531","article-title":"Fully Convolutional Networks for Semantic Segmentation of Very High Resolution Remotely Sensed Images Combined With DSM","volume":"15","author":"Sun","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Panboonyuen, T., Jitkajornwanich, K., Lawawirojwong, S., Srestasathiern, P., and Vateekul, P. (2019). Semantic Segmentation on Remotely Sensed Images Using an Enhanced Global Convolutional Network with Channel Attention and Domain Specific Transfer Learning. Remote Sens., 11.","DOI":"10.20944\/preprints201812.0090.v3"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1016\/j.isprsjprs.2017.12.007","article-title":"Semantic labeling in very high resolution images via a self-cascaded convolutional neural network","volume":"145","author":"Liu","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Wang, H., Wang, Y., Zhang, Q., Xiang, S., and Pan, C. (2017). Gated convolutional neural network for semantic segmentation in high-resolution images. Remote Sens., 9.","DOI":"10.3390\/rs9050446"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1109\/MGRS.2017.2762307","article-title":"Deep learning in remote sensing: A comprehensive review and list of resources","volume":"5","author":"Zhu","year":"2017","journal-title":"IEEE Geosci. Remote Sens. Mag."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Panboonyuen, T., Vateekul, P., Jitkajornwanich, K., and Lawawirojwong, S. (2017). An Enhanced Deep Convolutional Encoder-Decoder Network for Road Segmentation on Aerial Imagery. Recent Advances in Information and Communication Technology Series, Springer.","DOI":"10.1007\/978-3-319-60663-7_18"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"5396","DOI":"10.1109\/TIP.2020.2983560","article-title":"Multi-Granularity Canonical Appearance Pooling for Remote Sensing Scene Classification","volume":"29","author":"Wang","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"7492","DOI":"10.1109\/TGRS.2019.2913816","article-title":"Robust Space\u2013Frequency Joint Representation for Remote Sensing Image Scene Classification","volume":"57","author":"Fang","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"6899","DOI":"10.1109\/TGRS.2018.2845668","article-title":"Remote Sensing Scene Classification Using Multilayer Stacked Covariance Pooling","volume":"56","author":"He","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Chen, Y., Fan, R., Yang, X., Wang, J., and Latif, A. (2018). Extraction of Urban Water Bodies from High-Resolution Remote-Sensing Imagery Using Deep Learning. Water, 10.","DOI":"10.3390\/w10050585"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"3030","DOI":"10.1109\/JSTARS.2018.2846178","article-title":"Deep Convolutional Neural Network for Complex Wetland Classification Using Optical Remote Sensing Imagery","volume":"11","author":"Rezaee","year":"2018","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Mahdianpari, M., Salehi, B., Rezaee, M., Mohammadimanesh, F., and Zhang, Y. (2018). Very Deep Convolutional Neural Networks for Complex Land Cover Mapping Using Multispectral Remote Sensing Imagery. Remote Sens., 10.","DOI":"10.3390\/rs10071119"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Yang, H., Wu, P., Yao, X., Wu, Y., Wang, B., and Xu, Y. (2018). Building Extraction in Very High Resolution Imagery by Dense-Attention Networks. Remote Sens., 10.","DOI":"10.3390\/rs10111768"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1480","DOI":"10.1109\/TPAMI.2017.2712691","article-title":"Scene Segmentation with DAG-Recurrent Neural Networks","volume":"40","author":"Shuai","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201322). Non-local Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Liao, X., He, L., Yang, Z., and Zhang, C. (2018, January 2\u20136). Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention. Proceedings of the Asian Conference on Computer Vision, Perth, Australia.","DOI":"10.1007\/978-3-030-20876-9_39"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Du, Y., Yuan, C., Li, B., Zhao, L., Li, Y., and Hu, W. (2018, January 8\u201314). Interaction-Aware Spatio-Temporal Pyramid Attention Networks for Action Classification. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01270-0_23"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and So Kweon, I. (2018, January 8\u201314). Convolutional Block Attention Module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., Albanie, S., and Sun, G. (2018, January 18\u201322). Squeeze-and-Excitation Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 16\u201320). Dual Attention Network for Scene Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid Scene Parsing Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-Net: Convolutional Networks for Biomedical Image Segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"111322","DOI":"10.1016\/j.rse.2019.111322","article-title":"Land-Cover Classification with High-Resolution Remote Sensing Images Using Transferable Deep Models","volume":"237","author":"Tong","year":"2020","journal-title":"Remote Sens. Environ."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhao, X., Zhang, J., Tian, J., Zhuo, L., and Zhang, J. (2020). Residual Dense Network Based on Channel-Spatial Attention for the Scene Classification of a High-Resolution Remote Sensing Image. Remote Sens., 12.","DOI":"10.3390\/rs12111887"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"3660","DOI":"10.1109\/TGRS.2016.2523563","article-title":"Semantic Annotation of High-Resolution Satellite Images via Weakly Supervised Learning","volume":"54","author":"Yao","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"7405","DOI":"10.1109\/TGRS.2016.2601622","article-title":"Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images","volume":"54","author":"Cheng","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"6020","DOI":"10.1109\/TGRS.2016.2579648","article-title":"A Three-Layered Graph-Based Learning Approach for Remote Sensing Image Retrieval","volume":"54","author":"Wang","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"784","DOI":"10.1109\/JPROC.2012.2232891","article-title":"Airborne SAR-efficient signal processing for very high resolution","volume":"101","author":"Hubert","year":"2013","journal-title":"Proc. IEEE."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"3252","DOI":"10.1109\/JSTARS.2018.2860989","article-title":"Semantic segmentation for high spatial resolution remote sensing images based on convolution neural network and pyramid pooling module","volume":"11","author":"Yu","year":"2018","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"989","DOI":"10.1080\/01431168908903939","article-title":"Review Article Digital change detection techniques using remotely-sensed data","volume":"10","author":"Singh","year":"1989","journal-title":"Int. J. Remote Sens."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1016\/j.isprsjprs.2018.07.002","article-title":"Towards a polyalgorithm for land use change detection","volume":"144","author":"Saxena","year":"2018","journal-title":"J. Photogramm. Remote Sens."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1016\/j.isprsjprs.2018.04.013","article-title":"A scale-invariant change detection method for land use\/cover change research","volume":"141","author":"Xing","year":"2018","journal-title":"J. Photogramm. Remote Sens."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Ding, H., Jiang, X., Shuai, B., Liu, A.Q., and Wang, G. (2018, January 18\u201322). Context contrasted feature and gated multiscale aggregation for scene segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00254"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Lin, G., Milan, A., Shen, C., and Reid, I.D. (2017, January 21\u201326). Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.549"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs","volume":"40","author":"Chen","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_40","unstructured":"Lazebnik, S., Schmid, C., and Ponce, J. (2006, January 17\u201322). Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201906), New York, NY, USA."},{"key":"ref_41","unstructured":"Chen, L.-C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Peng, C., Zhang, X., Yu, G., Luo, G., and Sun, J. (2017, January 21\u201326). Large Kernel Matters\u2014Improve Semantic Segmentation by Global Convolutional Network. Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.189"},{"key":"ref_43","unstructured":"Mnih, V., Heess, N., and Graves, A. (2014, January 8\u201313). Recurrent models of visual attention. Proceedings of the Neural Information Processing Systems, Montr\u00e9al, QC, Canada."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1155","DOI":"10.1109\/TGRS.2018.2864987","article-title":"Scene classification with recurrent attention of VHR remote sensing images","volume":"57","author":"Wang","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_45","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Conference on Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., and Courville, A. (2015, January 7\u201313). Describing videos by exploiting temporal structure. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.512"},{"key":"ref_47","unstructured":"Kuen, J., Wang, Z., and Wang, G. (July, January 26). Recurrent attentional networks for saliency detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_49","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on Machine Learning 2015, Lille, France."}],"container-title":["ISPRS International Journal of Geo-Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2220-9964\/9\/10\/571\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:14:58Z","timestamp":1760177698000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2220-9964\/9\/10\/571"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,29]]},"references-count":49,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2020,10]]}},"alternative-id":["ijgi9100571"],"URL":"https:\/\/doi.org\/10.3390\/ijgi9100571","relation":{},"ISSN":["2220-9964"],"issn-type":[{"type":"electronic","value":"2220-9964"}],"subject":[],"published":{"date-parts":[[2020,9,29]]}}}