{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,27]],"date-time":"2026-01-27T22:24:11Z","timestamp":1769552651018,"version":"3.49.0"},"reference-count":44,"publisher":"MDPI AG","issue":"13","license":[{"start":{"date-parts":[[2022,7,3]],"date-time":"2022-07-03T00:00:00Z","timestamp":1656806400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62073304"],"award-info":[{"award-number":["62073304"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["41977242"],"award-info":[{"award-number":["41977242"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61973283"],"award-info":[{"award-number":["61973283"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Building segmentation for Unmanned Aerial Vehicle (UAV) imagery usually requires pixel-level labels, which are time-consuming and expensive to collect. Weakly supervised semantic segmentation methods for image-level labeling have recently achieved promising performance in natural scenes, but there have been few studies on UAV remote sensing imagery. In this paper, we propose a reliable label-supervised pixel attention mechanism for building segmentation in UAV imagery. Our method is based on the class activation map. However, classification networks tend to capture discriminative parts of the object and are insensitive to over-activation; therefore, class activation maps cannot directly guide segmentation network training. To overcome these challenges, we first design a Pixel Attention Module that captures rich contextual relationships, which can further mine more discriminative regions, in order to obtain a modified class activation map. Then, we use the initial seeds generated by the classification network to synthesize reliable labels. Finally, we design a reliable label loss, which is defined as the sum of the pixel-level differences between the reliable labels and the modified class activation map. Notably, the reliable label loss can handle over-activation. The preceding steps can significantly improve the quality of the pseudo-labels. Experiments on our home-made UAV data set indicate that our method can achieve 88.8% mIoU on the test set, outperforming previous state-of-the-art weakly supervised methods.<\/jats:p>","DOI":"10.3390\/rs14133196","type":"journal-article","created":{"date-parts":[[2022,7,4]],"date-time":"2022-07-04T20:59:18Z","timestamp":1656968358000},"page":"3196","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Reliable Label-Supervised Pixel Attention Mechanism for Weakly Supervised Building Segmentation in UAV Imagery"],"prefix":"10.3390","volume":"14","author":[{"given":"Jun","family":"Chen","sequence":"first","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"},{"name":"Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China"},{"name":"Engineering Research Center of Intelligent Technology for Geo-Exploration, Ministry of Education, Wuhan 430074, China"}]},{"given":"Weifeng","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"},{"name":"Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China"},{"name":"Engineering Research Center of Intelligent Technology for Geo-Exploration, Ministry of Education, Wuhan 430074, China"}]},{"given":"Yang","family":"Yu","sequence":"additional","affiliation":[{"name":"Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China"},{"name":"Key Laboratory of Infrared System Detecting and Imaging Technology, Chinese Academy of Sciences, Shanghai 200083, China"}]},{"given":"Chengli","family":"Peng","sequence":"additional","affiliation":[{"name":"Electronic Information School, Wuhan University, Wuhan 430072, China"}]},{"given":"Wenping","family":"Gong","sequence":"additional","affiliation":[{"name":"Faculty of Engineering, China University of Geosciences, Wuhan 430074, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,7,3]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"107498","DOI":"10.1016\/j.patcog.2020.107498","article-title":"Semantic segmentation using stride spatial pyramid pooling and dual attention decoder","volume":"107","author":"Peng","year":"2020","journal-title":"Pattern Recognit."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1016\/j.neunet.2021.01.021","article-title":"Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation","volume":"137","author":"Peng","year":"2021","journal-title":"Neural Netw."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing And Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Fu, K., Lu, W., Diao, W., Yan, M., Sun, H., Zhang, Y., and Sun, X. (2020). WSF-NET: Weakly supervised feature-fusion network for binary segmentation in remote sensing image. Remote Sens., 10.","DOI":"10.3390\/rs10121970"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Pathak, D., Krahenbuhl, P., and Darrell, T. (2015, January 11\u201318). Constrained convolutional neural networks for weakly supervised segmentation. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.209"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Pinheiro, P.O., and Collobert, R. (2015, January 7\u201312). From image-level to pixel-level labeling with convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298780"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. (2016, January 27\u201330). Learning deep features for discriminative localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.319"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wang, Y., Zhang, J., Kan, M., Shan, S., and Chen, X. (2020, January 14\u201319). Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01229"},{"key":"ref_11","first-page":"1","article-title":"Cross Fusion Net: A Fast Semantic Segmentation Network for Small-Scale Semantic Information Capturing in Aerial Scenes","volume":"60","author":"Peng","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2178","DOI":"10.1109\/TGRS.2019.2954461","article-title":"Toward automatic building footprint delineation from aerial images using CNN and regularization","volume":"58","author":"Wei","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"574","DOI":"10.1109\/TGRS.2018.2858817","article-title":"Fully Convolutional Networks for Multisource Building Extraction from an Open Aerial and Satellite Imagery Data Set","volume":"57","author":"Ji","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1968","DOI":"10.1109\/LGRS.2019.2960528","article-title":"DSSNet: A Simple Dilated Semantic Segmentation Network for Hyperspectral Imagery Classification","volume":"17","author":"Pan","year":"2020","journal-title":"IEEE Geosci. Remote. Sens. Lett."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"3308","DOI":"10.1080\/01431161.2018.1528024","article-title":"A scale robust convolutional neural network for automatic building extraction from aerial and satellite imagery","volume":"40","author":"Ji","year":"2019","journal-title":"Int. J. Remote Sens."},{"key":"ref_17","unstructured":"Chen, L.-C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv."},{"key":"ref_18","first-page":"1","article-title":"ASF-Net: Adaptive Screening Feature Network for Building Footprint Extraction from Remote-Sensing Images","volume":"60","author":"Chen","year":"2022","journal-title":"Int. J. Remote Sens."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wei, Y., Feng, J., Liang, X., Cheng, M.M., Zhao, Y., and Yan, S. (2017, January 21\u201326). Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.687"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Bearman, A., Russakovsky, O., Ferrari, V., and Fei-Fei, L. (2016, January 7\u201313). What\u2019s the point: Semantic segmentation with point supervision. Proceedings of the European Conference on Computer Vision, Graz, Austria.","DOI":"10.1007\/978-3-319-46478-7_34"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wang, S., Chen, W., Xie, S.M., Azzari, G., and Lobell, D.B. (2020). Weakly supervised deep learning for segmentation of remote sensing imagery. Remote Sens., 12.","DOI":"10.3390\/rs12020207"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Lin, D., Dai, J., Jia, J., He, K., and Sun, J. (2016, January 27\u201330). Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.344"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Vernaza, P., and Chandraker, M. (2017, January 21\u201326). Learning random-walk label propagation for weakly-supervised semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.315"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"58898","DOI":"10.1109\/ACCESS.2018.2874544","article-title":"Scribble-supervised segmentation of aerial building footprints using adversarial learning","volume":"6","author":"Wu","year":"2018","journal-title":"IEEE Access."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Song, C., Huang, Y., Ouyang, W., and Wang, L. (2019, January 16\u201320). Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00325"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Rafique, M.U., and Jacobs, N. (August, January 28). Weakly Supervised Building Segmentation from Aerial Images. Proceedings of the IGARSS 2019\u20142019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan.","DOI":"10.1109\/IGARSS.2019.8898812"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Guo, R., Sun, X., Chen, K., Zhou, X., Yan, Z., Diao, W., and Yan, M. (2020). Jmlnet: Joint multi-label learning network for weakly supervised semantic segmentation in aerial images. Remote Sens., 12.","DOI":"10.3390\/rs12193169"},{"key":"ref_28","unstructured":"Hou, Q., Jiang, P., Wei, Y., and Cheng, M.M. (2018, January 3\u20138). Self-erasing network for integral object attention. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zhang, X., Wei, Y., Feng, J., Yang, Y., and Huang, T.S. (2018, January 18\u201323). Adversarial complementary learning for weakly supervised object localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,  Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00144"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Ahn, J., and Kwak, S. (2018, January 18\u201323). Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00523"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ahn, J., Cho, S., and Kwak, S. (2019, January 16\u201320). Weakly supervised learning of instance segmentation with inter-pixel relations. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00231"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Sun, G., Wang, W., Dai, J., and Van Gool, L. (2020, January 23\u201328). Mining cross-image semantics for weakly supervised semantic segmentation. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58536-5_21"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Chen, L., Wu, W., Fu, C., Han, X., and Zhang, Y. (2020, January 23\u201328). Weakly supervised semantic segmentation with boundary exploration. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58574-7_21"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Yao, Y., Chen, T., Xie, G.S., Zhang, C., Shen, F., Wu, Q., Tang, Z., and Zhang, J. (2021, January 19\u201325). Non-salient region object mining for weakly supervised semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00265"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"14413","DOI":"10.1109\/ACCESS.2020.2966647","article-title":"Saliency guided self-attention network for weakly and semi-supervised semantic segmentation","volume":"8","author":"Yao","year":"2020","journal-title":"IEEE Access."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Lee, S., Lee, M., Lee, J., and Shim, H. (2021, January 19\u201325). Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00545"},{"key":"ref_37","unstructured":"Zeng, Y., Zhuge, Y., Lu, H., and Zhang, L. (November, January 27). Joint learning of saliency detection and weakly supervised semantic segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Chen, J., He, F., Zhang, Y., Sun, G., and Deng, M. (2020). SPMF-Net: Weakly supervised building segmentation by combining superpixel pooling and multi-scale feature fusion. Remote Sens., 12.","DOI":"10.3390\/rs12061049"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhang, J., Liu, Y., Wu, P., Shi, Z., and Pan, B. (2022). Mining Cross-Domain Structure Affinity for Refined Building Segmentation in Weakly Supervised Constraints. Remote Sens., 14.","DOI":"10.3390\/rs14051227"},{"key":"ref_40","unstructured":"Krahenbuhl, P., and Koltun, V. (2011, January 12\u201315). Efficient inference in fully connected crfs with gaussian edge potentials. Proceedings of the 24th International Conference on Neural Information Processing Systems, Granada, Spain."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1016\/j.patcog.2019.01.006","article-title":"Wider or deeper: Revisiting the resnet model for visual recognition","volume":"90","author":"Wu","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Li, F.-F. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_43","unstructured":"Chen, L.C., Kokkinos, I., Murphy, K., and Yuille, A.L. (2015, January 7\u20139). Semantic image segmentation with deep convolutional nets and fully connected crfs. Proceedings of the International Conference on Learning Representations, San Diego, CA, USA."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Lee, J., Kim, E., and Yoon, S. (2021, January 19\u201325). Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00406"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/13\/3196\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:42:19Z","timestamp":1760139739000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/13\/3196"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,3]]},"references-count":44,"journal-issue":{"issue":"13","published-online":{"date-parts":[[2022,7]]}},"alternative-id":["rs14133196"],"URL":"https:\/\/doi.org\/10.3390\/rs14133196","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,3]]}}}