{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T16:55:29Z","timestamp":1774630529249,"version":"3.50.1"},"reference-count":46,"publisher":"MDPI AG","issue":"13","license":[{"start":{"date-parts":[[2021,6,24]],"date-time":"2021-06-24T00:00:00Z","timestamp":1624492800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Automatic building extraction has been applied in many domains. It is also a challenging problem because of the complex scenes and multiscale. Deep learning algorithms, especially fully convolutional neural networks (FCNs), have shown robust feature extraction ability than traditional remote sensing data processing methods. However, hierarchical features from encoders with a fixed receptive field perform weak ability to obtain global semantic information. Local features in multiscale subregions cannot construct contextual interdependence and correlation, especially for large-scale building areas, which probably causes fragmentary extraction results due to intra-class feature variability. In addition, low-level features have accurate and fine-grained spatial information for tiny building structures but lack refinement and selection, and the semantic gap of across-level features is not conducive to feature fusion. To address the above problems, this paper proposes an FCN framework based on the residual network and provides the training pattern for multi-modal data combining the advantage of high-resolution aerial images and LiDAR data for building extraction. Two novel modules have been proposed for the optimization and integration of multiscale and across-level features. In particular, a multiscale context optimization module is designed to adaptively generate the feature representations for different subregions and effectively aggregate global context. A semantic guided spatial attention mechanism is introduced to refine shallow features and alleviate the semantic gap. Finally, hierarchical features are fused via the feature pyramid network. Compared with other state-of-the-art methods, experimental results demonstrate superior performance with 93.19 IoU, 97.56 OA on WHU datasets and 94.72 IoU, 97.84 OA on the Boston dataset, which shows that the proposed network can improve accuracy and achieve better performance for building extraction.<\/jats:p>","DOI":"10.3390\/rs13132473","type":"journal-article","created":{"date-parts":[[2021,6,24]],"date-time":"2021-06-24T11:01:38Z","timestamp":1624532498000},"page":"2473","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":26,"title":["Multiscale Semantic Feature Optimization and Fusion Network for Building Extraction Using High-Resolution Aerial Images and LiDAR Data"],"prefix":"10.3390","volume":"13","author":[{"given":"Qinglie","family":"Yuan","sequence":"first","affiliation":[{"name":"Department of Civil Engineering and Geospatial Information Science Research Centre (GISRC), Faculty of Engineering, Universiti Putra Malaysia (UPM), Serdang 43400, Selangor, Malaysia"},{"name":"Department of Geographic Information and Remote Sensing Research Centre, School of Civil and Architecture Engineering, Panzhihua University, Panzhihua 617000, China"}]},{"given":"Helmi Zulhaidi Mohd","family":"Shafri","sequence":"additional","affiliation":[{"name":"Department of Civil Engineering and Geospatial Information Science Research Centre (GISRC), Faculty of Engineering, Universiti Putra Malaysia (UPM), Serdang 43400, Selangor, Malaysia"}]},{"given":"Aidi Hizami","family":"Alias","sequence":"additional","affiliation":[{"name":"Department of Civil Engineering and Geospatial Information Science Research Centre (GISRC), Faculty of Engineering, Universiti Putra Malaysia (UPM), Serdang 43400, Selangor, Malaysia"}]},{"given":"Shaiful Jahari bin","family":"Hashim","sequence":"additional","affiliation":[{"name":"Department of Civil Engineering and Geospatial Information Science Research Centre (GISRC), Faculty of Engineering, Universiti Putra Malaysia (UPM), Serdang 43400, Selangor, Malaysia"}]}],"member":"1968","published-online":{"date-parts":[[2021,6,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"745309","DOI":"10.1155\/ASP.2005.2196","article-title":"Automated building extraction from high-resolution satellite imagery in urban areas using structural, contextual, and spectral information","volume":"2005","author":"Jin","year":"2005","journal-title":"EURASIP J. Adv. Signal Process."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1109\/JSTARS.2011.2168195","article-title":"Morphological Building\/Shadow Index for Building Extraction from High-Resolution Imagery over Urban Areas","volume":"5","author":"Huang","year":"2012","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"180","DOI":"10.1109\/JSTARS.2008.2002869","article-title":"A robust built-up area presence index by anisotropic rotation-invariant textural measure","volume":"1","author":"Pesaresi","year":"2008","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"5094","DOI":"10.1080\/01431161.2014.933278","article-title":"Automatic building extraction in dense urban areas through geoeye multispectral imagery","volume":"35","author":"Ghanea","year":"2014","journal-title":"Int. J. Remote Sens."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1016\/j.rcim.2019.03.001","article-title":"Real-time detection of surface deformation and strain in recycled aggregate concrete-filled steel tubular columns via four-ocular vision","volume":"59","author":"Tang","year":"2019","journal-title":"Robot. Comput.-Integr. Manuf."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"510","DOI":"10.3389\/fpls.2020.00510","article-title":"Recognition and localization methods for vision-based fruit picking robots: A review","volume":"11","author":"Tang","year":"2020","journal-title":"Front. Plant Sci."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Gharibbafghi, Z., Tian, J., and Reinartz, P. (2018). Modified super-pixel segmentation for digital surface model refinement and building extraction from satellite stereo imagery. Remote Sens., 10.","DOI":"10.3390\/rs10111824"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1109\/TGRS.2010.2053713","article-title":"A probabilistic framework to detect buildings in aerial and satellite images","volume":"49","author":"Sirmacek","year":"2010","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1127","DOI":"10.1080\/01431161.2016.1148283","article-title":"Building extraction in satellite images using active contours and color features","volume":"37","author":"Liasis","year":"2016","journal-title":"Int. J. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1016\/j.isprsjprs.2013.12.002","article-title":"Ground and building extraction from LiDAR data based on differential morphological profiles and locally fitted surfaces","volume":"93","author":"Mongus","year":"2014","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1016\/j.isprsjprs.2017.06.005","article-title":"Automatic building extraction from LiDAR data fusion of point and grid-based features","volume":"130","author":"Du","year":"2017","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1016\/j.infrared.2018.05.021","article-title":"A top-down strategy for buildings extraction from complex urban scenes using airborne LiDAR point clouds","volume":"92","author":"Huang","year":"2018","journal-title":"Infrared Phys. Technol."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1016\/j.isprsjprs.2018.08.009","article-title":"Extraction of residential building instances in suburban areas from mobile LiDAR data","volume":"144","author":"Xia","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Lai, X., Yang, J., Li, Y., and Wang, M. (2019). A building extraction approach based on the fusion of LiDAR point cloud and elevation map texture features. Remote Sens., 14.","DOI":"10.3390\/rs11141636"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Tang, Y., Chen, M., Lin, Y., Huang, X., Huang, K., He, Y., and Li, L. (2020). Vision-Based Three-Dimensional Reconstruction and Monitoring of Large-Scale Steel Tubular Structures. Adv. Civ. Eng., 2020.","DOI":"10.1155\/2020\/1236021"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1016\/j.neucom.2019.12.098","article-title":"Object-based multi-modal convolution neural networks for building extraction using panchromatic and multispectral imagery","volume":"386","author":"Chen","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1016\/j.isprsjprs.2019.05.013","article-title":"Improving public data for building segmentation from Convolutional Neural Networks (CNNs) for fused airborne Lidar and image data using active contours","volume":"154","author":"Griffiths","year":"2019","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"7502","DOI":"10.1109\/TGRS.2020.2973720","article-title":"Building Footprint Generation by Integrating Convolution Neural Network with Feature Pairwise Conditional Random Field (FPCRF)","volume":"58","author":"Li","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yang, H., Wu, P., Yao, X., Wu, Y., Wang, B., and Xu, Y. (2018). Building extraction in very high resolution imagery by dense-attention networks. Remote Sens., 24.","DOI":"10.3390\/rs10111768"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Ye, Z., Fu, Y., Gan, M., Deng, J., Comber, A., and Wang, K. (2019). Building Extraction from Very High Resolution Aerial Imagery Using Joint Attention Deep Neural Network. Remote Sens., 11.","DOI":"10.3390\/rs11242970"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1016\/j.isprsjprs.2019.02.019","article-title":"Automatic building extraction from high-resolution aerial images and LiDAR data using gated residual refinement network","volume":"151","author":"Huang","year":"2019","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Pan, X., Yang, F., Gao, L., Chen, Z., Zhang, B., Fan, H., and Ren, J. (2019). Building extraction from high-resolution aerial imagery using a generative adversarial network with spatial and channel attention mechanisms. Remote Sens., 11.","DOI":"10.3390\/rs11080917"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"2615","DOI":"10.1109\/JSTARS.2018.2849363","article-title":"Building footprint extraction from VHR remote sensing images combined with normalized DSMs using fused fully convolutional networks","volume":"11","author":"Bittner","year":"2018","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1016\/j.isprsjprs.2019.11.004","article-title":"Building segmentation through a gated graph convolutional neural network with deep structured feature embedding","volume":"159","author":"Shi","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"12","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical Image Computing and Computer-Assisted intervention, Springer.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Lin, G., Milan, A., Shen, C., and Reid, I. (2017, January 21\u201326). Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.549"},{"key":"ref_31","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_32","unstructured":"Liu, W., Rabinovich, A., and Berg, A.C. (2015). Parsenet: Looking wider to see better. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 15). Dual attention network for scene segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201323). Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., and Liu, W. (2019, January 27). Ccnet: Criss-cross attention for semantic segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00069"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Zhu, Z., Xu, M., Bai, S., Huang, T., and Bai, X. (2019, January 27). Asymmetric non-local neural networks for semantic segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00068"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Liu, H., Peng, C., Yu, C., Wang, J., Liu, X., Yu, G., and Jiang, W. (2019, January 15). An end-to-end network for panoptic segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00633"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"574","DOI":"10.1109\/TGRS.2018.2858817","article-title":"Fully Convolutional Networks for Multisource Building Extraction from an Open Aerial and Satellite Imagery dataset","volume":"57","author":"Ji","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_42","unstructured":"USGS (2021, May 10). Available online: https:\/\/earthexplorer.usgs.gov\/."},{"key":"ref_43","unstructured":"NOAA (2021, May 10). Available online: https:\/\/coast.noaa.gov\/dataviewer\/."},{"key":"ref_44","unstructured":"Mnih, V. (2013). Machine Learning for Aerial Image Labeling. [Ph.D. Thesis, University of Toronto]."},{"key":"ref_45","unstructured":"(2021, May 10). CloudCompare. Available online: http:\/\/www.cloudcompare.org\/."},{"key":"ref_46","unstructured":"Glorot, X., and Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/13\/2473\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:23:24Z","timestamp":1760163804000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/13\/2473"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,24]]},"references-count":46,"journal-issue":{"issue":"13","published-online":{"date-parts":[[2021,7]]}},"alternative-id":["rs13132473"],"URL":"https:\/\/doi.org\/10.3390\/rs13132473","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,24]]}}}