{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T22:07:40Z","timestamp":1772143660655,"version":"3.50.1"},"reference-count":56,"publisher":"MDPI AG","issue":"15","license":[{"start":{"date-parts":[[2020,7,22]],"date-time":"2020-07-22T00:00:00Z","timestamp":1595376000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Semantic segmentation is an important and challenging task in the aerial image community since it can extract the target level information for understanding the aerial image. As a practical application of aerial image semantic segmentation, building extraction always attracts researchers\u2019 attention as the building is the specific land cover in the aerial images. There are two key points for building extraction from aerial images. One is learning the global and local features to fully describe the buildings with diverse shapes. The other one is mining the multi-scale information to discover the buildings with different resolutions. Taking these two key points into account, we propose a new method named global multi-scale encoder-decoder network (GMEDN) in this paper. Based on the encoder-decoder framework, GMEDN is developed with a local and global encoder and a distilling decoder. The local and global encoder aims at learning the representative features from the aerial images for describing the buildings, while the distilling decoder focuses on exploring the multi-scale information for the final segmentation masks. Combining them together, the building extraction is accomplished in an end-to-end manner. The effectiveness of our method is validated by the experiments counted on two public aerial image datasets. Compared with some existing methods, our model can achieve better performance.<\/jats:p>","DOI":"10.3390\/rs12152350","type":"journal-article","created":{"date-parts":[[2020,7,23]],"date-time":"2020-07-23T11:26:01Z","timestamp":1595503561000},"page":"2350","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":68,"title":["Building Extraction of Aerial Images by a Global and Multi-Scale Encoder-Decoder Network"],"prefix":"10.3390","volume":"12","author":[{"given":"Jingjing","family":"Ma","sequence":"first","affiliation":[{"name":"Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Joint International Research Laboratory of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi\u2019an 710071, China"}]},{"given":"Linlin","family":"Wu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Joint International Research Laboratory of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi\u2019an 710071, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1375-0778","authenticated-orcid":false,"given":"Xu","family":"Tang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Joint International Research Laboratory of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi\u2019an 710071, China"}]},{"given":"Fang","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0379-2042","authenticated-orcid":false,"given":"Xiangrong","family":"Zhang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Joint International Research Laboratory of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi\u2019an 710071, China"}]},{"given":"Licheng","family":"Jiao","sequence":"additional","affiliation":[{"name":"Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Joint International Research Laboratory of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi\u2019an 710071, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,7,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Maggiori, E., Tarabalka, Y., Charpiat, G., and Alliez, P. (2017, January 23\u201328). Can semantic labeling methods generalize to any city? The inria aerial image labeling benchmark. Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA.","DOI":"10.1109\/IGARSS.2017.8127684"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1865","DOI":"10.1109\/JPROC.2017.2675998","article-title":"Remote sensing image scene classification: Benchmark and state of the art","volume":"105","author":"Cheng","year":"2017","journal-title":"Proc. IEEE"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"473","DOI":"10.5194\/isprs-annals-III-3-473-2016","article-title":"Semantic segmentation of aerial images with an ensemble of CNNs","volume":"3","author":"Marmanis","year":"2016","journal-title":"ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Tang, X., Liu, C., Ma, J., Zhang, X., Liu, F., and Jiao, L. (2019). Large-Scale Remote Sensing Image Retrieval Based on Semi-Supervised Adversarial Hashing. Remote Sens., 11.","DOI":"10.3390\/rs11172055"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Tang, X., Zhang, X., Liu, F., and Jiao, L. (2018). Unsupervised deep feature learning for remote sensing image retrieval. Remote Sens., 10.","DOI":"10.3390\/rs10081243"},{"key":"ref_6","unstructured":"Mou, L., and Zhu, X.X. (2018). RiFCN: Recurrent network in fully convolutional network for semantic segmentation of high resolution remote sensing images. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhang, X., Ma, W., Li, C., Wu, J., Tang, X., and Jiao, L. (2019). Fully Convolutional Network-Based Ensemble Method for Road Extraction From Aerial Images. IEEE Geosci. Remote Sens. Lett.","DOI":"10.1109\/LGRS.2019.2953523"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhang, X., Han, X., Li, C., Tang, X., Zhou, H., and Jiao, L. (2019). Aerial image road extraction based on an improved generative adversarial network. Remote Sens., 11.","DOI":"10.3390\/rs11080930"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"574","DOI":"10.1109\/TGRS.2018.2858817","article-title":"Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set","volume":"57","author":"Ji","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"6699","DOI":"10.1109\/TGRS.2018.2841808","article-title":"Vehicle instance segmentation from aerial image and video using a multitask learning residual fully convolutional network","volume":"56","author":"Mou","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"888","DOI":"10.1109\/34.868688","article-title":"Normalized cuts and image segmentation","volume":"22","author":"Shi","year":"2000","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1145\/1015706.1015720","article-title":"GrabCut: Interactive foreground extraction using iterated graph cuts","volume":"23","author":"Rother","year":"2004","journal-title":"ACM Trans. Graph. (TOG)"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Shotton, J., Johnson, M., and Cipolla, R. (2008, January 23\u201328). Semantic texton forests for image categorization and segmentation. Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587503"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Vezhnevets, A., Ferrari, V., and Buhmann, J.M. (2011, January 6\u201313). Weakly supervised semantic segmentation with a multi-image model. Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126299"},{"key":"ref_15","unstructured":"Thoma, M. (2016). A survey of semantic segmentation. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Li, A., and Bao, X. (2010, January 23\u201324). Extracting image dominant color features based on region growing. Proceedings of the 2010 International Conference on Web Information Systems and Mining, Sanya, China.","DOI":"10.1109\/WISM.2010.116"},{"key":"ref_17","unstructured":"Hong, Z., and Xuanbing, Z. (2010, January 22\u201324). Texture feature extraction based on wavelet transform. Proceedings of the 2010 International Conference on Computer Application and System Modeling (ICCASM 2010), Taiyuan, China."},{"key":"ref_18","unstructured":"Wang, J., Xu, Z., and Liu, Y. (2013, January 13\u201314). Texture-based segmentation for extracting image shape features. Proceedings of the 2013 19th International Conference on Automation and Computing, London, UK."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"5158","DOI":"10.1109\/JSTARS.2015.2495267","article-title":"Remote sensing image classification: No features, no clustering","volume":"8","author":"Cui","year":"2015","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1109\/72.991427","article-title":"A comparison of methods for multiclass support vector machines","volume":"13","author":"Hsu","year":"2002","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1109\/21.97458","article-title":"A survey of decision tree classifier methodology","volume":"21","author":"Safavian","year":"1991","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhang, J., Hong, X., Guan, S.U., Zhao, X., Xin, H., and Xue, N. (2016, January 23\u201325). Maximum Gaussian mixture model for classification. Proceedings of the 2016 8th International Conference on Information Technology in Medicine and Education (ITME), Fuzhou, China.","DOI":"10.1109\/ITME.2016.0139"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Kuang, P., Cao, W.N., and Wu, Q. (2014, January 19\u201321). Preview on structures and algorithms of deep learning. Proceedings of the 2014 11th International Computer Conference on Wavelet Actiev Media Technology and Information Processing (ICCWAMTIP), Chengdu, China.","DOI":"10.1109\/ICCWAMTIP.2014.7073385"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1797","DOI":"10.1109\/LGRS.2014.2309695","article-title":"Vehicle detection in satellite images by hybrid deep convolutional neural networks","volume":"11","author":"Chen","year":"2014","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Nguyen, K., Fookes, C., and Sridharan, S. (2015, January 27\u201330). Improving deep convolutional neural networks with unsupervised feature learning. Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada.","DOI":"10.1109\/ICIP.2015.7351206"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2529","DOI":"10.1109\/TIP.2016.2547588","article-title":"Text-attentional convolutional neural network for scene text detection","volume":"25","author":"He","year":"2016","journal-title":"IEEE Trans. Image Process."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Wang, Y., Chen, Q., Zhu, Q., Liu, L., Li, C., and Zheng, D. (2019). A survey of mobile laser scanning applications and key techniques over urban areas. Remote Sens., 11.","DOI":"10.3390\/rs11131540"},{"key":"ref_29","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_31","unstructured":"Takikawa, T., Acuna, D., Jampani, V., and Fidler, S. (November, January 27). Gated-scnn: Gated shape cnns for semantic segmentation. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_32","unstructured":"Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A.L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_34","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_38","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster r-cnn: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Huang, Z., Huang, L., Gong, Y., Huang, C., and Wang, X. (2019, January 16\u201320). Mask scoring r-cnn. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00657"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/j.isprsjprs.2017.05.002","article-title":"Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks","volume":"130","author":"Alshehhi","year":"2017","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Shrestha, S., and Vanneschi, L. (2018). Improved fully convolutional network with conditional random fields for building extraction. Remote Sens., 10.","DOI":"10.3390\/rs10071135"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1016\/j.isprsjprs.2019.02.019","article-title":"Automatic building extraction from high-resolution aerial images and LiDAR data using gated residual refinement network","volume":"151","author":"Huang","year":"2019","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"2793","DOI":"10.1109\/TPAMI.2017.2750680","article-title":"Learning building extraction in aerial scenes with convolutional networks","volume":"40","author":"Yuan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Liu, P., Liu, X., Liu, M., Shi, Q., Yang, J., Xu, X., and Zhang, Y. (2019). Building footprint extraction from high-resolution images via spatial residual inception convolutional neural network. Remote Sens., 11.","DOI":"10.3390\/rs11070830"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"128774","DOI":"10.1109\/ACCESS.2019.2940527","article-title":"Automatic building extraction on high-resolution remote sensing imagery using deep convolutional encoder-decoder with spatial pyramid pooling","volume":"7","author":"Liu","year":"2019","journal-title":"IEEE Access"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1842","DOI":"10.1109\/JSTARS.2020.2991391","article-title":"Refined Extraction of Building Outlines from High-resolution Remote Sensing Imagery Based on a Multifeature Convolutional Neural Network and Morphological Filtering","volume":"13","author":"Xie","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201323). Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention\u2014MICCAI 2015, Springer.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"2920","DOI":"10.1109\/TGRS.2018.2878510","article-title":"Aerial LaneNet: Lane-marking semantic segmentation in aerial imagery using wavelet-enhanced cost-sensitive symmetric fully convolutional neural networks","volume":"57","author":"Azimi","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_50","unstructured":"Lin, M., Chen, Q., and Yan, S. (2013). Network in network. arXiv."},{"key":"ref_51","unstructured":"Mnih, V. (2013). Machine Learning for Aerial Image Labeling, University of Toronto."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Singh, P., and Komodakis, N. (2018, January 22\u201327). Effective Building Extraction by Learning to Detect and Correct Erroneous Labels in Segmentation Mask. Proceedings of the IGARSS 2018\u20142018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain.","DOI":"10.1109\/IGARSS.2018.8517854"},{"key":"ref_54","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20138). Imagenet classification with deep convolutional neural networks. Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1109\/LGRS.2018.2868880","article-title":"Objects segmentation from high-resolution aerial images using U-Net with pyramid pooling layers","volume":"16","author":"Kim","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/15\/2350\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:50:43Z","timestamp":1760176243000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/15\/2350"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,22]]},"references-count":56,"journal-issue":{"issue":"15","published-online":{"date-parts":[[2020,8]]}},"alternative-id":["rs12152350"],"URL":"https:\/\/doi.org\/10.3390\/rs12152350","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,7,22]]}}}