{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T14:11:57Z","timestamp":1773324717696,"version":"3.50.1"},"reference-count":39,"publisher":"MDPI AG","issue":"24","license":[{"start":{"date-parts":[[2020,12,17]],"date-time":"2020-12-17T00:00:00Z","timestamp":1608163200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61731022"],"award-info":[{"award-number":["61731022"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61860206004"],"award-info":[{"award-number":["61860206004"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61701495"],"award-info":[{"award-number":["61701495"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Building extraction from high spatial resolution remote sensing images is a hot spot in the field of remote sensing applications and computer vision. This paper presents a semantic segmentation model, which is a supervised method, named Pyramid Self-Attention Network (PISANet). Its structure is simple, because it contains only two parts: one is the backbone of the network, which is used to learn the local features (short distance context information around the pixel) of buildings from the image; the other part is the pyramid self-attention module, which is used to obtain the global features (long distance context information with other pixels in the image) and the comprehensive features (includes color, texture, geometric and high-level semantic feature) of the building. The network is an end-to-end approach. In the training stage, the input is the remote sensing image and corresponding label, and the output is probability map (the probability that each pixel is or is not building). In the prediction stage, the input is the remote sensing image, and the output is the extraction result of the building. The complexity of the network structure was reduced so that it is easy to implement. The proposed PISANet was tested on two datasets. The result shows that the overall accuracy reached 94.50 and 96.15%, the intersection-over-union reached 77.45 and 87.97%, and F1 index reached 87.27 and 93.55%, respectively. In experiments on different datasets, PISANet obtained high overall accuracy, low error rate and improved integrity of individual buildings.<\/jats:p>","DOI":"10.3390\/s20247241","type":"journal-article","created":{"date-parts":[[2020,12,17]],"date-time":"2020-12-17T10:42:47Z","timestamp":1608201767000},"page":"7241","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":41,"title":["Robust Building Extraction for High Spatial Resolution Remote Sensing Images with Self-Attention Network"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9218-9941","authenticated-orcid":false,"given":"Dengji","family":"Zhou","sequence":"first","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guizhou","family":"Wang","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guojin","family":"He","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3572-4415","authenticated-orcid":false,"given":"Tengfei","family":"Long","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5594-0815","authenticated-orcid":false,"given":"Ranyu","family":"Yin","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhaoming","family":"Zhang","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sibao","family":"Chen","sequence":"additional","affiliation":[{"name":"MOE Key Lab of Signal Processing and Intelligent Computing, School of Computer Science and Technology, Anhui University, Hefei 230601, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5948-5055","authenticated-orcid":false,"given":"Bin","family":"Luo","sequence":"additional","affiliation":[{"name":"MOE Key Lab of Signal Processing and Intelligent Computing, School of Computer Science and Technology, Anhui University, Hefei 230601, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,12,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1016\/j.isprsjprs.2019.02.019","article-title":"Automatic building extraction from high-resolution aerial images and LiDAR data using gated residual refinement network","volume":"151","author":"Huang","year":"2019","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Zhang, Z., and Wang, Y. (2019). JointNet: A common neural network for road and building extraction. Remote Sens., 11.","DOI":"10.3390\/rs11060696"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Sun, G., Huang, H., Zhang, A., Li, F., Zhao, H., and Fu, H. (2019). Fusion of multiscale convolutional neural networks for building extraction in very high-resolution images. Remote Sens., 11.","DOI":"10.3390\/rs11030227"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1016\/0734-189X(90)90139-M","article-title":"Use of shadows for extracting buildings in aerial images","volume":"49","author":"Liow","year":"1990","journal-title":"Comput. Vis. Graph. Image Process."},{"key":"ref_5","first-page":"2196","article-title":"Automated building extraction from high-resolution satellite imagery in urban areas using structural, contextual, and spectral information","volume":"14","author":"Jin","year":"2005","journal-title":"EURASIP J. Adv. Signal Process."},{"key":"ref_6","first-page":"255","article-title":"Automatic Building Extraction from Satellite Imagery","volume":"13","author":"Theng","year":"2006","journal-title":"Eng. Lett."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Lef\u00e8vre, S., Weber, J., and Sheeren, D. (2007, January 11\u201313). Automatic building extraction in VHR images using advanced morphological operators. Proceedings of the Urban Remote Sensing Joint Event (IEEE), Paris, France.","DOI":"10.1109\/URS.2007.371825"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"721","DOI":"10.14358\/PERS.77.7.721","article-title":"A multidirectional and multiscale morphological index for automatic building extraction from multispectral GeoEye-1 imagery","volume":"77","author":"Huang","year":"2011","journal-title":"Photogramm. Eng. Remote Sens."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1109\/JSTARS.2011.2168195","article-title":"Morphological building\/shadow index for building extraction from high-resolution imagery over urban areas","volume":"5","author":"Huang","year":"2011","journal-title":"IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Jiang, N., Zhang, J.X., Li, H.T., and Lin, X.G. (July, January 30). Semi-automatic building extraction from high resolution imagery based on segmentation. Proceedings of the 2008 International Workshop on Earth Observation and Remote Sensing Applications (IEEE), Beijing, China.","DOI":"10.1109\/EORSA.2008.4620311"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Liu, P., Di, L., Du, Q., and Wang, L. (2018). Remote sensing big data: Theory, methods and applications. Remote Sens., 10.","DOI":"10.3390\/rs10050711"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"Imagenet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"481","DOI":"10.5194\/isprs-archives-XLII-1-W1-481-2017","article-title":"Building extraction from remote sensing data using fully convolutional networks","volume":"42","author":"Bittner","year":"2017","journal-title":"Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. ISPRS Arch."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Shrestha, S., and Vanneschi, L. (2018). Improved fully convolutional network with conditional random fields for building extraction. Remote Sens., 10.","DOI":"10.3390\/rs10071135"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Xu, Y., Wu, L., Xie, Z., and Chen, Z. (2018). Building extraction in very high resolution remote sensing imagery using deep learning and guided filters. Remote Sens., 10.","DOI":"10.3390\/rs10010144"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhong, Z., Li, J., Cui, W., and Jiang, H. (2016, January 10\u201315). Fully convolutional networks for building and road extraction: Preliminary results. Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China.","DOI":"10.1109\/IGARSS.2016.7729406"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Huang, Z., Cheng, G., Wang, H., Li, H., Shi, L., and Pan, C. (2016, January 10\u201315). Building extraction from multi-source remote sensing images via deep deconvolution neural networks. Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China.","DOI":"10.1109\/IGARSS.2016.7729471"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"2793","DOI":"10.1109\/TPAMI.2017.2750680","article-title":"Learning building extraction in aerial scenes with convolutional networks","volume":"40","author":"Yuan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"3308","DOI":"10.1080\/01431161.2018.1528024","article-title":"A scale robust convolutional neural network for automatic building extraction from aerial and satellite imagery","volume":"40","author":"Ji","year":"2019","journal-title":"Int. J. Remote Sens."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"2600","DOI":"10.1109\/JSTARS.2018.2835377","article-title":"Building extraction at scale using convolutional neural network: Mapping of the United States","volume":"11","author":"Yang","year":"2018","journal-title":"IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201322). Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 16\u201320). Dual attention network for scene segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Li, X., Zhong, Z., Wu, J., Yang, Y., Lin, Z., and Liu, H. (2019, January 16\u201320). Expectation-maximization attention networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/ICCV.2019.00926"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhao, H., Zhang, Y., Liu, S., Shi, J., Change Loy, C., Lin, D., and Jia, J. (2018, January 8\u201314). Psanet: Point-wise spatial attention network for scene parsing. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_17"},{"key":"ref_25","unstructured":"Chen, Y., Kalantidis, Y., Li, J., Yan, S., and Feng, J. (2018, January 3\u20138). A^ 2-nets: Double attention networks. Proceedings of the Advances in Neural Information Processing Systems, Montr\u00e9al, QC, Canada."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., and Liu, W. (2019, January 16\u201320). Ccnet: Criss-cross attention for semantic segmentation. Proceedings of the IEEE International Conference on Computer Vision, Long Beach, CA, USA.","DOI":"10.1109\/ICCV.2019.00069"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"712","DOI":"10.1109\/JSTARS.2016.2598859","article-title":"Active deep learning for classification of hyperspectral images","volume":"10","author":"Liu","year":"2016","journal-title":"IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"7053","DOI":"10.1007\/s00500-016-2247-2","article-title":"SVM or deep learning? A comparative study on remote sensing image classification","volume":"21","author":"Liu","year":"2017","journal-title":"Soft Comput."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Liu, P., Liu, X., Liu, M., Shi, Q., Yang, J., Xu, X., and Zhang, Y. (2019). Building footprint extraction from high-resolution images via spatial residual inception convolutional neural network. Remote Sens., 11.","DOI":"10.3390\/rs11070830"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"3680","DOI":"10.1109\/JSTARS.2018.2865187","article-title":"Building-a-nets: Robust building extraction from high-resolution remote sensing images with adversarial networks","volume":"11","author":"Li","year":"2018","journal-title":"IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens."},{"key":"ref_31","unstructured":"Zhu, Z., Xu, M., Bai, S., Huang, T., and Bai, X. (November, January 27). Asymmetric non-local neural networks for semantic segmentation. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_32","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_33","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial pyramid pooling in deep convolutional networks for visual recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Maggiori, E., Tarabalka, Y., Charpiat, G., and Alliez, P. (2017, January 23\u201328). Can semantic labeling methods generalize to any city? The inria aerial image labeling benchmark. Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA.","DOI":"10.1109\/IGARSS.2017.8127684"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"574","DOI":"10.1109\/TGRS.2018.2858817","article-title":"Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set","volume":"57","author":"Ji","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_39","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv, Available online: https:\/\/arxiv.org\/abs\/1706.05587."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/24\/7241\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:46:27Z","timestamp":1760179587000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/24\/7241"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,17]]},"references-count":39,"journal-issue":{"issue":"24","published-online":{"date-parts":[[2020,12]]}},"alternative-id":["s20247241"],"URL":"https:\/\/doi.org\/10.3390\/s20247241","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,17]]}}}