{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T02:27:21Z","timestamp":1773714441061,"version":"3.50.1"},"reference-count":57,"publisher":"MDPI AG","issue":"14","license":[{"start":{"date-parts":[[2021,7,6]],"date-time":"2021-07-06T00:00:00Z","timestamp":1625529600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["Nos. 41871308"],"award-info":[{"award-number":["Nos. 41871308"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Key R&amp;D Program of China (International Scientific &amp; Technological Cooperation Program","award":["2019YFE0106500"],"award-info":[{"award-number":["2019YFE0106500"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Deep-learning technologies, especially convolutional neural networks (CNNs), have achieved great success in building extraction from areal images. However, shape details are often lost during the down-sampling process, which results in discontinuous segmentation or inaccurate segmentation boundary. In order to compensate for the loss of shape information, two shape-related auxiliary tasks (i.e., boundary prediction and distance estimation) were jointly learned with building segmentation task in our proposed network. Meanwhile, two consistency constraint losses were designed based on the multi-task network to exploit the duality between the mask prediction and two shape-related information predictions. Specifically, an atrous spatial pyramid pooling (ASPP) module was appended to the top of the encoder of a U-shaped network to obtain multi-scale features. Based on the multi-scale features, one regression loss and two classification losses were used for predicting the distance-transform map, segmentation, and boundary. Two inter-task consistency-loss functions were constructed to ensure the consistency between distance maps and masks, and the consistency between masks and boundary maps. Experimental results on three public aerial image data sets showed that our method achieved superior performance over the recent state-of-the-art models.<\/jats:p>","DOI":"10.3390\/rs13142656","type":"journal-article","created":{"date-parts":[[2021,7,6]],"date-time":"2021-07-06T11:36:44Z","timestamp":1625571404000},"page":"2656","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["A Multi-Task Network with Distance\u2013Mask\u2013Boundary Consistency Constraints for Building Extraction from Aerial Images"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0750-954X","authenticated-orcid":false,"given":"Furong","family":"Shi","sequence":"first","affiliation":[{"name":"State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0683-4669","authenticated-orcid":false,"given":"Tong","family":"Zhang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,7,6]]},"reference":[{"key":"ref_1","first-page":"653","article-title":"A Survey of Building Extraction Methods from Optical High Resolution Remote Sensing Imagery","volume":"31","author":"Jun","year":"2016","journal-title":"Remote Sens. Technol. Appl."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"322","DOI":"10.1016\/j.isprsjprs.2019.11.028","article-title":"Extraction of urban building damage using spectral, height and corner information from VHR satellite images and airborne LiDAR data","volume":"159","author":"Wang","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Liao, C., Hu, H., Li, H., Ge, X., Chen, M., Li, C., and Zhu, Q. (2021). Joint Learning of Contour and Structure for Boundary-Preserved Building Extraction. Remote Sens., 13.","DOI":"10.3390\/rs13061049"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Jin, Y., Xu, W., Zhang, C., Luo, X., and Jia, H. (2021). Boundary-Aware Refined Network for Automatic Building Extraction in Very High-Resolution Urban Aerial Images. Remote Sens., 13.","DOI":"10.3390\/rs13040692"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., and Garcia-Rodriguez, J. (2017). A review on deep learning techniques applied to semantic segmentation. arXiv.","DOI":"10.1016\/j.asoc.2018.05.018"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1109\/MGRS.2017.2762307","article-title":"Deep learning in remote sensing: A comprehensive review and list of resources","volume":"5","author":"Zhu","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Mag."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs","volume":"40","author":"Chen","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ghiasi, G., and Fowlkes, C.C. (2016, January 11\u201314). Laplacian pyramid reconstruction and refinement for semantic segmentation. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46487-9_32"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Lin, G., Milan, A., Shen, C., and Reid, I. (2017, January 21\u201326). Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.549"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"574","DOI":"10.1109\/TGRS.2018.2858817","article-title":"Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set","volume":"57","author":"Ji","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1350","DOI":"10.3390\/rs10091350","article-title":"A Multiple-Feature Reuse Network to Extract Buildings from Remote Sensing Imagery","volume":"10","author":"Lin","year":"2018","journal-title":"Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"158","DOI":"10.1016\/j.isprsjprs.2017.11.009","article-title":"Classification with an edge: Improving semantic image segmentation with boundary detection","volume":"135","author":"Marmanis","year":"2017","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"5769","DOI":"10.1109\/JSTARS.2017.2747599","article-title":"FusionNet: Edge aware deep convolutional networks for semantic segmentation of remote sensing harbor images","volume":"10","author":"Cheng","year":"2017","journal-title":"IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Liu, S., Ding, W., Liu, C., Liu, Y., Wang, Y., and Li, H. (2018). ERN: Edge loss reinforced semantic segmentation network for remote sensing images. Remote Sens., 10.","DOI":"10.3390\/rs10091339"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"He, S., and Jiang, W. (2021). Boundary-Assisted Learning for Building Extraction from Optical Remote Sensing Imagery. Remote Sens., 13.","DOI":"10.3390\/rs13040760"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1016\/j.isprsjprs.2020.09.019","article-title":"Parsing very high resolution urban scene images by learning deep ConvNets with edge-aware loss","volume":"170","author":"Zheng","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Yang, G., Zhang, Q., and Zhang, G. (2020). EANet: Edge-aware network for the extraction of buildings from aerial images. Remote Sens., 12.","DOI":"10.3390\/rs12132161"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Bischke, B., Helber, P., Folz, J., Borth, D., and Dengel, A. (2019, January 22\u201325). Multi-task learning for segmentation of building footprints with deep neural networks. Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan.","DOI":"10.1109\/ICIP.2019.8803050"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"786","DOI":"10.1109\/LGRS.2018.2880986","article-title":"Effective building extraction from high-resolution remote sensing images with multitask driven deep neural network","volume":"16","author":"Hui","year":"2019","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Newell, A., Yang, K., and Deng, J. (2016). Stacked hourglass networks for human pose estimation. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"ref_23","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European conference on computer vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid Scene Parsing Network. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Yang, J., Price, B., Cohen, S., Lee, H., and Yang, M.H. (2016, January 27\u201330). Object contour detection with a fully convolutional encoder-decoder network. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.28"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Xie, S., and Tu, Z. (2015, January 7\u201313). Holistically-nested edge detection. Proceedings of the IEEE Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.164"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Liu, Y., Cheng, M.M., Hu, X., Wang, K., and Bai, X. (2017, January 21\u201326). Richer convolutional features for edge detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.622"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Bertasius, G., Shi, J., and Torresani, L. (2016, January 27\u201330). Semantic segmentation with boundary neural fields. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.392"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Takikawa, T., Acuna, D., Jampani, V., and Fidler, S. (November, January 27). Gated-SCNN: Gated Shape CNNs for Semantic Segmentation. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00533"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Hatamizadeh, A., Terzopoulos, D., and Myronenko, A. (2020). Edge-gated CNNs for volumetric semantic segmentation of medical images. arXiv.","DOI":"10.1101\/2020.03.14.992115"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"102795","DOI":"10.1016\/j.cviu.2019.102795","article-title":"Faster training of Mask R-CNN by focusing on instance boundaries","volume":"188","author":"Zimmermann","year":"2019","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Cheng, T., Wang, X., Huang, L., and Liu, W. (2020). Boundary-Preserving Mask R-CNN. Trans. Petri Nets Other Models Concurr. XV, 660\u2013676.","DOI":"10.1007\/978-3-030-58568-6_39"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"2793","DOI":"10.1109\/TPAMI.2017.2750680","article-title":"Learning building extraction in aerial scenes with convolutional networks","volume":"40","author":"Yuan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1016\/j.isprsjprs.2020.01.023","article-title":"Aerial image semantic segmentation using DCNN predicted distance maps","volume":"161","author":"Chai","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Hayder, Z., He, X., and Salzmann, M. (2017, January 21\u201326). Boundary-aware instance segmentation. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.70"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Wang, Y., Wei, X., Liu, F., Chen, J., Zhou, Y., Shen, W., Fishman, E.K., and Yuille, A.L. (2020, January 13\u201319). Deep distance transform for tubular structure segmentation in CTscans. Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00389"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/S0262-8856(98)00092-4","article-title":"Development of a graph-based approach for building detection","volume":"17","author":"Kim","year":"1999","journal-title":"Image Vis. Comput."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"2063","DOI":"10.1109\/JSTARS.2014.2369475","article-title":"Shadow-based rooftop segmentation in visible band images","volume":"8","author":"Femiani","year":"2014","journal-title":"IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"4483","DOI":"10.1109\/TGRS.2015.2400462","article-title":"Robust rooftop extraction from visible band images using higher order CRF","volume":"53","author":"Li","year":"2015","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1016\/j.isprsjprs.2007.05.011","article-title":"Automatic recognition of man-made objects in high resolution optical remote sensing images by SVM classification of geometric image features","volume":"62","author":"Inglada","year":"2007","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_42","first-page":"58","article-title":"Building extraction from high-resolution optical spaceborne images using the integration of support vector machine (SVM) classification, Hough transformation and perceptual grouping","volume":"34","author":"Turker","year":"2015","journal-title":"Int. J. Appl. Earth Obs."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Guo, Z., Chen, Q., Wu, G., Xu, Y., Shibasaki, R., and Shao, X. (2017). Village building identification based on ensemble convolutional neural networks. Sensors, 17.","DOI":"10.3390\/s17112487"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Guo, Z., Shao, X., Xu, Y., Miyazaki, H., Ohira, W., and Shibasaki, R. (2016). Identification of village building via Google Earth images and supervised machine learning methods. Remote Sens., 8.","DOI":"10.3390\/rs8040271"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Huang, Z., Cheng, G., Wang, H., Li, H., Shi, L., and Pan, C. (2016, January 10\u201315). Building extraction from multi-source remote sensing images via deep deconvolution neural networks. Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China.","DOI":"10.1109\/IGARSS.2016.7729471"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"645","DOI":"10.1109\/TGRS.2016.2612821","article-title":"Convolutional neural networks for large-scale remote-sensing image classification","volume":"55","author":"Maggiori","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"2178","DOI":"10.1109\/TGRS.2019.2954461","article-title":"Toward automatic building footprint delineation from aerial images using CNN and regularization","volume":"58","author":"Wei","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Kang, W., Xiang, Y., Wang, F., and You, H. (2019). EU-net: An efficient fully convolutional network for building extraction from optical remote sensing images. Remote Sens., 11.","DOI":"10.3390\/rs11232813"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 15\u201320). Dual attention network for scene segmentation. Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Ye, Z., Fu, Y., Gan, M., Deng, J., Comber, A., and Wang, K. (2019). Building extraction from very high resolution aerial imagery using joint attention deep neural network. Remote Sens., 11.","DOI":"10.3390\/rs11242970"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Zhu, Q., Liao, C., Hu, H., Mei, X., and Li, H. (2020). MAP-Net: Multiple attending path neural network for building footprint extraction from remote sensed imagery. IEEE Trans. Geosci. Remote Sens.","DOI":"10.1109\/TGRS.2020.3026051"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Zamir, A.R., Sax, A., Cheerla, N., Suri, R., Cao, Z., Malik, J., and Guibas, L.J. (2020, January 13\u201319). Robust Learning Through Cross-Task Consistency. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01121"},{"key":"ref_53","unstructured":"(2018, July 07). ISPRS 2D Semantic Labeling Contest. Available online: http:\/\/www2.isprs.org\/commissions\/comm3\/wg4\/2d-sem-label-vaihingen.html."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Maggiori, E., Tarabalka, Y., Charpiat, G., and Alliez, P. (2017, January 23\u201328). Can semantic labeling methods generalize to any city? The inria aerial image labeling benchmark. Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA.","DOI":"10.1109\/IGARSS.2017.8127684"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"1842","DOI":"10.1109\/JSTARS.2020.2991391","article-title":"Refined extraction of building outlines from high-resolution remote sensing imagery based on a multi feature convolutional neural network and morphological filtering","volume":"13","author":"Xie","year":"2020","journal-title":"IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., and Sorkine-Hornung, A. (2016, January 27\u201330). A benchmark dataset and evaluation methodology for video object segmentation. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.85"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Liu, P., Liu, X., Liu, M., Shi, Q., Yang, J., Xu, X., and Zhang, Y. (2019). Building footprint extraction from high-resolution images via spatial residual inception convolutional neural network. Remote Sens., 11.","DOI":"10.3390\/rs11070830"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/14\/2656\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:26:43Z","timestamp":1760164003000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/14\/2656"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,6]]},"references-count":57,"journal-issue":{"issue":"14","published-online":{"date-parts":[[2021,7]]}},"alternative-id":["rs13142656"],"URL":"https:\/\/doi.org\/10.3390\/rs13142656","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,7,6]]}}}