{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,17]],"date-time":"2025-12-17T10:11:01Z","timestamp":1765966261714,"version":"3.48.0"},"reference-count":28,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,12,17]],"date-time":"2025-12-17T00:00:00Z","timestamp":1765929600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["42171435"],"award-info":[{"award-number":["42171435"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100007129","name":"Shandong Provincial Natural Science Foundation","doi-asserted-by":"publisher","award":["ZR2024QD012"],"award-info":[{"award-number":["ZR2024QD012"]}],"id":[{"id":"10.13039\/501100007129","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100007129","name":"Shandong Provincial Natural Science Foundation","doi-asserted-by":"publisher","award":["ZR2021MD006"],"award-info":[{"award-number":["ZR2021MD006"]}],"id":[{"id":"10.13039\/501100007129","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Postgraduate Education and Teaching Reform Foundation of Shandong Province","award":["SDYJG19115"],"award-info":[{"award-number":["SDYJG19115"]}]},{"name":"Undergraduate Education and Teaching Reform Foundation of Shandong Province","award":["Z2021014"],"award-info":[{"award-number":["Z2021014"]}]},{"name":"Youth Innovation Team Project of Higher School in Shandong Province","award":["2023KJ121"],"award-info":[{"award-number":["2023KJ121"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IJGI"],"abstract":"<jats:p>An accurate and reliable extraction of building structures from high-resolution (HR) remote sensing images is an important research topic in 3D cartography and smart city construction. However, despite the strong overall performance of recent deep learning models, limitations remain in handling significant variations in building scales and complex architectural forms, which may lead to inaccurate boundaries or difficulties in extracting small or irregular structures. Therefore, the present study proposes MSA-UNet, a reliable semantic segmentation framework that leverages multiscale feature aggregation and attentive skip connections for an accurate extraction of building footprints. This framework is constructed based on the U-Net architecture, incorporating VGG16 as a replacement for the original encoder structure, which enhances its ability to capture low-discriminative features. To further improve the representation of image buildings with different scales and shapes, a serial coarse-to-fine feature aggregation mechanism was used. Additionally, a novel skip connection was built between the encoder and decoder layers to enable adaptive weights. Furthermore, a dual-attention mechanism, implemented through the convolutional block attention module, was integrated to enhance the focus of the network on building regions. Extensive experiments conducted on the WHU and Inria building datasets validated the effectiveness of MSA-UNet. On the WHU dataset, the model demonstrated a state-of-the-art performance with a mean Intersection over Union (mIoU) of 94.26%, accuracy of 98.32%, F1-score of 96.57%, and mean Pixel accuracy (mPA) of 96.85%, corresponding to gains of 1.41% in mIoU over the baseline U-Net. On the more challenging Inria dataset, MSA-UNet achieved an mIoU of 85.92%, indicating a consistent improvement of up to 1.9% over the baseline U-Net. These results confirmed that MSA-UNet markedly improved the accuracy and boundary integrity of building extraction from HR data, outperforming existing classic models in terms of segmentation quality and robustness.<\/jats:p>","DOI":"10.3390\/ijgi14120497","type":"journal-article","created":{"date-parts":[[2025,12,17]],"date-time":"2025-12-17T09:43:46Z","timestamp":1765964626000},"page":"497","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["MSA-UNet: Multiscale Feature Aggregation with Attentive Skip Connections for Precise Building Extraction"],"prefix":"10.3390","volume":"14","author":[{"given":"Guobiao","family":"Yao","sequence":"first","affiliation":[{"name":"School of Surveying and Geo-Informatics, Shandong Jianzhu University, Jinan 250101, China"},{"name":"School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430070, China"}]},{"given":"Yan","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Surveying and Geo-Informatics, Shandong Jianzhu University, Jinan 250101, China"}]},{"given":"Wenxiao","family":"Sun","sequence":"additional","affiliation":[{"name":"School of Surveying and Geo-Informatics, Shandong Jianzhu University, Jinan 250101, China"}]},{"given":"Zeyu","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Surveying and Geo-Informatics, Shandong Jianzhu University, Jinan 250101, China"}]},{"given":"Yifei","family":"Tang","sequence":"additional","affiliation":[{"name":"School of Surveying and Geo-Informatics, Shandong Jianzhu University, Jinan 250101, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2964-7698","authenticated-orcid":false,"given":"Jingxue","family":"Bi","sequence":"additional","affiliation":[{"name":"School of Surveying and Geo-Informatics, Shandong Jianzhu University, Jinan 250101, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"6994","DOI":"10.1109\/JSTARS.2025.3538662","article-title":"Advances and future prospects in building extraction from high-resolution remote sensing images","volume":"18","author":"Yang","year":"2025","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"721","DOI":"10.14358\/PERS.77.7.721","article-title":"Photogrammetric engineering & remote sensing","volume":"77","author":"Huang","year":"2011","journal-title":"Photogramm. Eng. Remote Sens."},{"key":"ref_3","first-page":"58","article-title":"Building extraction from high-resolution optical spaceborne images using the integration of support vector machine (SVM) classification, Hough transformation and perceptual grouping","volume":"34","author":"Turker","year":"2015","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Roffey, M., and Leblanc, S.G. (2021). A novel framework for rapid detection of damaged buildings using pre-event LiDAR data and shadow change information. Remote Sens., 13.","DOI":"10.3390\/rs13163297"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Jung, S., Lee, K., and Lee, W.H. (2022). Object-based high-rise building detection using morphological building index and digital map. Remote Sens., 14.","DOI":"10.3390\/rs14020330"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"123101","DOI":"10.1007\/s11432-017-9252-5","article-title":"Semantic segmentation of high-resolution images","volume":"60","author":"Wang","year":"2017","journal-title":"Sci. China Inf. Sci."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"640","DOI":"10.1109\/TPAMI.2016.2572683","article-title":"Fully convolutional networks for semantic segmentation","volume":"39","author":"Shelhamer","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-Net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_10","unstructured":"Chen, L.-C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs","volume":"40","author":"Chen","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., and Wang, M. (2022, January 23\u201327). Swin-Unet: Unet-like pure transformer for medical image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25066-8_9"},{"key":"ref_14","unstructured":"Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., and Luo, P. (2021). SegFormer: Simple and efficient design for semantic segmentation with transformers. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Chen, K., Zou, Z., and Shi, Z. (2021). Building extraction from remote sensing images with sparse token transformers. Remote Sens., 13.","DOI":"10.3390\/rs13214441"},{"key":"ref_16","first-page":"6008405","article-title":"DSAT-Net: Dual spatial attention transformer for building extraction from aerial images","volume":"20","author":"Zhang","year":"2023","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Xia, L., Mi, S., Zhang, J., Luo, J., Shen, Z., and Cheng, Y. (2023). Dual-stream feature extraction network based on CNN and transformer for building extraction. Remote Sens., 15.","DOI":"10.3390\/rs15102689"},{"key":"ref_18","unstructured":"Oktay, O., Schlemper, J., Le Folgoc, L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., and Kainz, B. (2018). Attention U-Net: Learning where to look for the pancreas. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Ye, Z., Fu, Y., Gan, M., Deng, J., Comber, A., and Wang, K. (2019). Building extraction from very high resolution aerial imagery using joint attention deep neural network. Remote Sens., 11.","DOI":"10.3390\/rs11242970"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"6005305","DOI":"10.1109\/LGRS.2023.3272353","article-title":"BEARNet: A novel buildings edge-aware refined network for building extraction from high-resolution remote sensing images","volume":"20","author":"Lin","year":"2023","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Xue, H., Liu, K., Wang, Y., Chen, Y., Huang, C., Wang, P., and Li, L. (2024). MAD-UNet: A multi-region UAV remote sensing network for rural building extraction. Sensors, 24.","DOI":"10.3390\/s24082393"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"3679","DOI":"10.1109\/TCSVT.2024.3509504","article-title":"DSNet: A novel way to use atrous convolutions in semantic segmentation","volume":"35","author":"Guo","year":"2025","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"2504605","DOI":"10.1109\/LGRS.2024.3439100","article-title":"DHI-Net: A novel detail-preserving and hierarchical interaction network for building extraction","volume":"21","author":"Song","year":"2024","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_24","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_25","unstructured":"Zheng, M., Sun, L., Dong, J., and Pan, J. (October, January 29). SMFANet: A lightweight self-modulation feature aggregation network for efficient image super-resolution. Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). CBAM: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"574","DOI":"10.1109\/TGRS.2018.2858817","article-title":"Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set","volume":"57","author":"Ji","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Maggiori, E., Tarabalka, Y., Charpiat, G., and Alliez, P. (2017, January 23\u201328). Can semantic labeling methods generalize to any city? The INRIA aerial image labeling benchmark. Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA.","DOI":"10.1109\/IGARSS.2017.8127684"}],"container-title":["ISPRS International Journal of Geo-Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2220-9964\/14\/12\/497\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,17]],"date-time":"2025-12-17T10:03:32Z","timestamp":1765965812000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2220-9964\/14\/12\/497"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,17]]},"references-count":28,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["ijgi14120497"],"URL":"https:\/\/doi.org\/10.3390\/ijgi14120497","relation":{},"ISSN":["2220-9964"],"issn-type":[{"value":"2220-9964","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,17]]}}}