{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,10]],"date-time":"2026-07-10T00:15:59Z","timestamp":1783642559793,"version":"3.55.0"},"reference-count":56,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2020,4,28]],"date-time":"2020-04-28T00:00:00Z","timestamp":1588032000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["41701446\uff0c41971356"],"award-info":[{"award-number":["41701446\uff0c41971356"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Semantic segmentation of high-resolution remote sensing images plays an important role in applications for building extraction. However, the current algorithms have some semantic information extraction limitations, and these can lead to poor segmentation results. To extract buildings with high accuracy, we propose a multiloss neural network based on attention. The designed network, based on U-Net, can improve the sensitivity of the model by the attention block and suppress the background influence of irrelevant feature areas. To improve the ability of the model, a multiloss approach is proposed during training the network. The experimental results show that the proposed model offers great improvement over other state-of-the-art methods. For the public Inria Aerial Image Labeling dataset, the F1 score reached 76.96% and showed good performance on the Aerial Imagery for Roof Segmentation dataset.<\/jats:p>","DOI":"10.3390\/rs12091400","type":"journal-article","created":{"date-parts":[[2020,4,29]],"date-time":"2020-04-29T01:29:15Z","timestamp":1588123755000},"page":"1400","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":149,"title":["Building Extraction Based on U-Net with an Attention Block and Multiple Losses"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4097-4814","authenticated-orcid":false,"given":"Mingqiang","family":"Guo","sequence":"first","affiliation":[{"name":"School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Heng","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7421-4915","authenticated-orcid":false,"given":"Yongyang","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ying","family":"Huang","sequence":"additional","affiliation":[{"name":"Wuhan Zondy Cyber Technology Co. Ltd., Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2020,4,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Rees, W.G. (2013). Physical Principles of Remote Sensing, Cambridge University Press.","DOI":"10.1017\/CBO9781139017411"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1929","DOI":"10.1080\/13658816.2017.1341632","article-title":"Quality assessment of building footprint data using a deep autoencoder network","volume":"31","author":"Xu","year":"2017","journal-title":"Int. J. Geogr. Inf. Sci."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Liu, Y., Minh Nguyen, D., Deligiannis, N., Ding, W., and Munteanu, A. (2017). Hourglass-shapenetwork based semantic segmentation for high resolution aerial imagery. Remote Sens., 9.","DOI":"10.3390\/rs9060522"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Liu, Y., Piramanayagam, S., Monteiro, S.T., and Saber, E. (2017, January 21\u201326). Dense semantic labeling of very-high-resolution aerial imagery and lidar with fully-convolutional neural networks and higher-order CRFs. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2017), Honolulu, Hawaii, USA.","DOI":"10.1109\/CVPRW.2017.200"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Pan, X., Gao, L., Marinoni, A., Zhang, B., Yang, F., and Gamba, P. (2018). Semantic labeling of high resolution aerial imagery and LiDAR data with fine segmentation network. Remote Sens., 10.","DOI":"10.3390\/rs10050743"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Xu, Y., Wu, L., Xie, Z., and Chen, Z. (2018). Building extraction in very high resolution remote sensing imagery using deep learning and guided filters. Remote Sens., 10.","DOI":"10.3390\/rs10010144"},{"key":"ref_7","unstructured":"Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (July, January 26). The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1156","DOI":"10.1109\/TGRS.2008.2008440","article-title":"Urban-Area and Building Detection Using SIFT Keypoints and Graph Theory","volume":"47","author":"Sirmacek","year":"2009","journal-title":"IEEE Trans. Geosci. Remote"},{"key":"ref_9","first-page":"161","article-title":"Morphological Building\/Shadow Index for Building Extraction from High-Resolution Imagery over Urban Areas","volume":"5","author":"Huang","year":"2012","journal-title":"IEEE J.-Stars"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1388","DOI":"10.1109\/LGRS.2016.2590481","article-title":"A Morphological Building Detection Framework for High-Resolution Optical Imagery over Urban Areas","volume":"13","author":"Zhang","year":"2016","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_11","first-page":"150","article-title":"Automatic urban building boundary extraction from high resolution aerial images using an innovative model of active contours","volume":"12","author":"Ahmadi","year":"2010","journal-title":"Int. J. Appl. Earth Obs."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1127","DOI":"10.1080\/01431161.2016.1148283","article-title":"Building extraction in satellite images using active contours and colour features","volume":"37","author":"Liasis","year":"2016","journal-title":"Int. J. Remote Sens."},{"key":"ref_13","first-page":"906","article-title":"Building Extraction from Remotely Sensed Images by Integrating Saliency Cue","volume":"10","author":"Li","year":"2017","journal-title":"IEEE J.-Stars"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/j.isprsjprs.2013.09.004","article-title":"Automated detection of buildings from single VHR multispectral images using shadow information and graph cuts","volume":"86","author":"Ok","year":"2013","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1016\/j.isprsjprs.2015.03.011","article-title":"Semantic classification of urban buildings combining VHR image and GIS data: An improved random forest approach","volume":"105","author":"Du","year":"2015","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_16","first-page":"58","article-title":"Building extraction from high-resolution optical spaceborne images using the integration of support vector machine (SVM) classification, Hough transformation and perceptual grouping","volume":"34","author":"Turker","year":"2015","journal-title":"Int. J. Appl. Earth Obs."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1016\/j.isprsjprs.2007.05.011","article-title":"Automatic recognition of man-made objects in high resolution optical remote sensing images by SVM classification of geometric image features","volume":"62","author":"Inglada","year":"2007","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1109\/MSP.2017.2749125","article-title":"Advanced Deep-Learning Techniques for Salient and Category-Specific Object Detection: A Survey","volume":"35","author":"Han","year":"2018","journal-title":"IEEE Signal Proc. Mag."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/j.isprsjprs.2017.05.002","article-title":"Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks","volume":"130","author":"Alshehhi","year":"2017","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"7092","DOI":"10.1109\/TGRS.2017.2740362","article-title":"High-Resolution Aerial Image Labeling with Convolutional Neural Networks","volume":"55","author":"Maggiori","year":"2017","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Shrestha, S., and Vanneschi, L. (2018). Improved fully convolutional network with conditional random fields for building extraction. Remote Sens., 10.","DOI":"10.3390\/rs10071135"},{"key":"ref_23","unstructured":"Allen-Zhu, Z., Li, Y., and Song, Z. (July, January 10). A Convergence Theory for Deep Learning via over-Parameterization. Proceedings of the 36th International Conference on Machine Learning, PMLR, Stockholm, Sweden."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1254","DOI":"10.1109\/34.730558","article-title":"A model of saliency-based visual attention for rapid scene analysis","volume":"20","author":"Itti","year":"1998","journal-title":"IEEE Trans. Pattern Anal. Mach. Intel."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1038\/35058500","article-title":"Computational modelling of visual attention","volume":"2","author":"Itti","year":"2001","journal-title":"Nat. Rev. Neurosci."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Xu, Y., Xie, Z., Feng, Y., and Chen, Z. (2018). Road extraction from high-resolution remote sensing imagery using deep learning. Remote Sens., 10.","DOI":"10.3390\/rs10091461"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation, Springer.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Pinheiro, P.O., Lin, T., Collobert, R., and Doll\u00e1r, P. (2016). Learning to Refine Object Segments, Springer.","DOI":"10.1007\/978-3-319-46448-0_5"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Lin, T., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii Convention Center, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"713","DOI":"10.1109\/LSP.2008.2002718","article-title":"Scene Segmentation and Semantic Representation for High-Level Retrieval","volume":"15","author":"Zhu","year":"2008","journal-title":"IEEE Signal Proc. Lett."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016, January 27\u201330). Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.308"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., and Liang, J. (2018). Unet++: A Nested U-Net Architecture for Medical Image Segmentation, Springer.","DOI":"10.1007\/978-3-030-00889-5_1"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs","volume":"40","author":"Chen","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intel."},{"key":"ref_35","unstructured":"Chen, L., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Chen, L., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Cao, C., Liu, X., Yang, Y., Yu, Y., Wang, J., Wang, Z., Huang, Y., Wang, L., Huang, C., and Xu, W. (2015, January 7\u201313). Look and Think Twice: Capturing Top-down Visual Attention with Feedback Convolutional Neural Networks. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.338"},{"key":"ref_38","unstructured":"Larochelle, H., and Hinton, G.E. (2010). Learning to combine foveal glimpses with a third-order Boltzmann machine. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_39","unstructured":"Mnih, V., Heess, N., and Graves, A. (2014). Recurrent models of visual attention. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"53","DOI":"10.3389\/fncom.2011.00053","article-title":"A biologically plausible transform for visual recognition that is invariant to translation, scale, and rotation","volume":"5","author":"Sountsov","year":"2011","journal-title":"Front. Comput. Neurosc."},{"key":"ref_41","unstructured":"Jaderberg, M., Simonyan, K., and Zisserman, A. (2015). Spatial transformer networks. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_42","unstructured":"Bluche, T. (2016). Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_43","unstructured":"Miech, A., Laptev, I., and Sivic, J. (2017). Learnable pooling with context gating for video classification. arXiv."},{"key":"ref_44","unstructured":"Stollenga, M.F., Masci, J., Gomez, F., and Schmidhuber, J. (2014). Deep networks with internal selective attention through feedback connections. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201322). Squeeze-and-excitation networks. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_46","unstructured":"Shi, W., Caballero, J., Husz\u00e1r, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., and Wang, Z. (July, January 26). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_47","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (July, January 21). Pyramid scene parsing network. Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Dai, J., He, K., and Sun, J. (2015, January 7\u201312). Convolutional feature masking for joint object and stuff segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299025"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"929","DOI":"10.1109\/TPAMI.2007.1046","article-title":"Toward Objective Evaluation of Image Segmentation Algorithms","volume":"29","author":"Unnikrishnan","year":"2007","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"553","DOI":"10.1080\/01621459.1983.10478008","article-title":"A method for comparing two hierarchical clusterings","volume":"78","author":"Fowlkes","year":"1983","journal-title":"J. Am. Stat Assoc."},{"key":"ref_51","first-page":"209","article-title":"Distance measures for image segmentation evaluation","volume":"2006","author":"Jiang","year":"2006","journal-title":"EURASIP J. Appl. Signal Proc."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Unnikrishnan, R., and Hebert, M. (2005, January 5\u20137). Measures of similarity. Proceedings of the 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV\/MOTION\u201905)-Volume 1, Breckenridge, CO, USA.","DOI":"10.1109\/ACVMOT.2005.71"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Pan, X., Yang, F., Gao, L., Chen, Z., Zhang, B., Fan, H., and Ren, J. (2019). Building Extraction from High-Resolution Aerial Imagery Using a Generative Adversarial Network with Spatial and Channel Attention Mechanisms. Remote Sens., 11.","DOI":"10.3390\/rs11080917"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Bischke, B., Helber, P., Folz, J., Borth, D., and Dengel, A. (2019, January 22\u201325). Multi-task learning for segmentation of building footprints with deep neural networks. Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan.","DOI":"10.1109\/ICIP.2019.8803050"},{"key":"ref_55","unstructured":"Khalel, A., and El-Saban, M. (2018). Automatic pixelwise object labeling for aerial imagery using stacked u-nets. arXiv."},{"key":"ref_56","unstructured":"Marcu, A., Costea, D., Slusanschi, E., and Leordeanu, M. (2018). A multi-stage multi-task neural network for aerial scene interpretation and geolocalization. arXiv."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/9\/1400\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T14:09:23Z","timestamp":1760364563000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/9\/1400"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,4,28]]},"references-count":56,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2020,5]]}},"alternative-id":["rs12091400"],"URL":"https:\/\/doi.org\/10.3390\/rs12091400","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,4,28]]}}}