{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T18:16:13Z","timestamp":1774376173068,"version":"3.50.1"},"reference-count":57,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2022,1,10]],"date-time":"2022-01-10T00:00:00Z","timestamp":1641772800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["No.61973036"],"award-info":[{"award-number":["No.61973036"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Semantic segmentation is one of the significant tasks in understanding aerial images with high spatial resolution. Recently, Graph Neural Network (GNN) and attention mechanism have achieved excellent performance in semantic segmentation tasks in general images and been applied to aerial images. In this paper, we propose a novel Superpixel-based Attention Graph Neural Network (SAGNN) for semantic segmentation of high spatial resolution aerial images. A K-Nearest Neighbor (KNN) graph is constructed from our network for each image, where each node corresponds to a superpixel in the image and is associated with a hidden representation vector. On this basis, the initialization of the hidden representation vector is the appearance feature extracted by a unary Convolutional Neural Network (CNN) from the image. Moreover, relying on the attention mechanism and recursive functions, each node can update its hidden representation according to the current state and the incoming information from its neighbors. The final representation of each node is used to predict the semantic class of each superpixel. The attention mechanism enables graph nodes to differentially aggregate neighbor information, which can extract higher-quality features. Furthermore, the superpixels not only save computational resources, but also maintain object boundary to achieve more accurate predictions. The accuracy of our model on the Potsdam and Vaihingen public datasets exceeds all benchmark approaches, reaching 90.23% and 89.32%, respectively.<\/jats:p>","DOI":"10.3390\/rs14020305","type":"journal-article","created":{"date-parts":[[2022,1,10]],"date-time":"2022-01-10T22:03:13Z","timestamp":1641852193000},"page":"305","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":40,"title":["Superpixel-Based Attention Graph Neural Network for Semantic Segmentation in Aerial Images"],"prefix":"10.3390","volume":"14","author":[{"given":"Qi","family":"Diao","sequence":"first","affiliation":[{"name":"Beijing Institute of Technology, Beijing 100081, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8795-5333","authenticated-orcid":false,"given":"Yaping","family":"Dai","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology, Beijing 100081, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5100-3584","authenticated-orcid":false,"given":"Ce","family":"Zhang","sequence":"additional","affiliation":[{"name":"Lancaster Environment Centre, Lancaster University, Lancaster LA1 4YQ, UK"},{"name":"UK Centre for Ecology & Hydrology, Library Avenue, Lancaster LA1 4AP, UK"}]},{"given":"Yan","family":"Wu","sequence":"additional","affiliation":[{"name":"Robotics & Autonomous Systems Department, A*STAR Institute for Infocomm Research, Singapore 138632, Singapore"}]},{"given":"Xiaoxue","family":"Feng","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology, Beijing 100081, China"}]},{"given":"Feng","family":"Pan","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology, Beijing 100081, China"},{"name":"Kunming-BIT Industry Technology Research Institute Inc., Kunming 650106, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,1,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"7092","DOI":"10.1109\/TGRS.2017.2740362","article-title":"High-resolution aerial image labeling with convolutional neural networks","volume":"55","author":"Maggiori","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"3357","DOI":"10.1109\/TIP.2019.2896492","article-title":"Automatic land cover reconstruction from historical aerial images: An evaluation of features extraction and classification algorithms","volume":"28","author":"Ratajczak","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"2320","DOI":"10.1016\/j.rse.2011.04.032","article-title":"Mapping urbanization dynamics at regional and global scales using multi-temporal DMSP\/OLS nighttime light data","volume":"115","author":"Zhang","year":"2011","journal-title":"Remote Sens. Environ."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1777","DOI":"10.1109\/LGRS.2019.2953523","article-title":"Fully convolutional network-based ensemble method for road extraction from aerial images","volume":"17","author":"Zhang","year":"2019","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1304","DOI":"10.1109\/TGRS.2014.2337658","article-title":"Analysis of oblique aerial images for land cover and point cloud classification in an urban environment","volume":"53","author":"Rau","year":"2014","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1437","DOI":"10.1109\/TIP.2007.894239","article-title":"Classification-driven watershed segmentation","volume":"16","author":"Levner","year":"2007","journal-title":"IEEE Trans. Image Process."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"592","DOI":"10.1016\/j.patcog.2007.06.014","article-title":"Annealing and the normalized N-cut","volume":"41","author":"Gedeon","year":"2008","journal-title":"Pattern Recognit."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1145\/1015706.1015720","article-title":"\u201cGrabCut\u201d interactive foreground extraction using iterated graph cuts","volume":"23","author":"Rother","year":"2004","journal-title":"ACM Trans. Graph. (TOG)"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Xiao, T., Liu, Y., Zhou, B., Jiang, Y., and Sun, J. (2018, January 8\u201314). Unified perceptual parsing for scene understanding. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01228-1_26"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Zhang, X., Peng, C., Xue, X., and Sun, J. (2018, January 8\u201314). Exfuse: Enhancing feature fusion for semantic segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01249-6_17"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., and Liu, W. (2019, January 27\u201328). Ccnet: Criss-cross attention for semantic segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00069"},{"key":"ref_13","unstructured":"Veli\u010dkovi\u0107, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. (2017). Graph attention networks. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Yu, W., Zheng, C., Cheng, W., Aggarwal, C.C., Song, D., Zong, B., Chen, H., and Wang, W. (2018, January 19\u201323). Learning deep network representations with adversarially regularized autoencoders. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK.","DOI":"10.1145\/3219819.3220000"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Liang, X., Shen, X., Feng, J., Lin, L., and Yan, S. (2016, January 11\u201314). Semantic object parsing with graph lstm. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_8"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Liang, X., Lin, L., Shen, X., Feng, J., Yan, S., and Xing, E.P. (2017, January 21\u201326). Interpretable structure-evolving lstm. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.234"},{"key":"ref_17","unstructured":"Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A.L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_19","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Caesar, H., Uijlings, J., and Ferrari, V. (2018, January 18\u201322). Coco-stuff: Thing and stuff classes in context. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00132"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2016, January 27\u201330). The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.350"},{"key":"ref_25","unstructured":"Waqas Zamir, S., Arora, A., Gupta, A., Khan, S., Sun, G., Shahbaz Khan, F., Zhu, F., Shao, L., Xia, G.S., and Bai, X. (2019, January 16\u201317). isaid: A large-scale dataset for instance segmentation in aerial images. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1016\/j.isprsjprs.2018.06.007","article-title":"Deep multi-task learning for a geographically-regularized semantic segmentation of aerial images","volume":"144","author":"Volpi","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"3492","DOI":"10.1109\/JSTARS.2019.2930724","article-title":"High-Resolution Aerial Images Semantic Segmentation Using Deep Fully Convolutional Network With Channel Attention Mechanism","volume":"12","author":"Luo","year":"2019","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_28","first-page":"5603018","article-title":"Hybrid multiple attention network for semantic segmentation in aerial images","volume":"60","author":"Niu","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Li, X., He, H., Li, X., Li, D., Cheng, G., Shi, J., Weng, L., Tong, Y., and Lin, Z. (2021, January 19\u201325). PointFlow: Flowing semantics through points for aerial image segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00420"},{"key":"ref_30","first-page":"3844","article-title":"Convolutional neural networks on graphs with fast localized spectral filtering","volume":"29","author":"Defferrard","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_31","unstructured":"Dai, H., Kozareva, Z., Dai, B., Smola, A., and Song, L. (2018, January 10\u201315). Learning steady-states of iterative algorithms over graphs. Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden."},{"key":"ref_32","unstructured":"Gori, M., Monfardini, G., and Scarselli, F. (August, January 31). A new model for learning in graph domains. Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada."},{"key":"ref_33","unstructured":"Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. (2015). Gated graph sequence neural networks. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1109\/TNN.2008.2005605","article-title":"The graph neural network model","volume":"20","author":"Scarselli","year":"2008","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Tai, K.S., Socher, R., and Manning, C.D. (2015). Improved semantic representations from tree-structured long short-term memory networks. arXiv.","DOI":"10.3115\/v1\/P15-1150"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Lee, J.B., Rossi, R., and Kong, X. (2018, January 19\u201323). Graph classification using structural attention. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK.","DOI":"10.1145\/3219819.3219980"},{"key":"ref_37","unstructured":"Thekumparampil, K.K., Wang, C., Oh, S., and Li, L.J. (2018). Attention-based graph neural network for semi-supervised learning. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Tu, K., Cui, P., Wang, X., Yu, P.S., and Zhu, W. (2018, January 19\u201323). Deep recursive network embedding with regular equivalence. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK.","DOI":"10.1145\/3219819.3220068"},{"key":"ref_39","unstructured":"Bojchevski, A., Shchur, O., Z\u00fcgner, D., and G\u00fcnnemann, S. (2018, January 10\u201315). Netgan: Generating graphs via random walks. Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden."},{"key":"ref_40","unstructured":"You, J., Ying, R., Ren, X., Hamilton, W., and Leskovec, J. (2018, January 10\u201315). Graphrnn: Generating realistic graphs with deep auto-regressive models. Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"106746","DOI":"10.1016\/j.knosys.2021.106746","article-title":"STGSN\u2014A Spatial\u2013Temporal Graph Neural Network framework for time-evolving social networks","volume":"214","author":"Min","year":"2021","journal-title":"Knowl. Based Syst."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Tao, Y., Wang, C., Yao, L., Li, W., and Yu, Y. (2021). Item trend learning for sequential recommendation system using gated graph neural network. Neural Comput. Appl., 1\u201316.","DOI":"10.1007\/s00521-021-05723-2"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zhao, C., Liu, S., Huang, F., Liu, S., and Zhang, W. (2021, January 19\u201327). CSGNN: Contrastive self-supervised graph neural network for molecular interaction \n\t\t  prediction. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, Online.","DOI":"10.24963\/ijcai.2021\/517"},{"key":"ref_44","unstructured":"Youn, C.H., and Linh, V.L. (2021, January 20\u201322). Dynamic graph neural network for super-pixel image classification. Proceedings of the 2021 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Korea."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Avelar, P.H., Tavares, A.R., da Silveira, T.L., Jung, C.R., and Lamb, L.C. (2020, January 7\u201310). Superpixel image classification with graph attention networks. Proceedings of the 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Porto de Galinhas, Brazil.","DOI":"10.1109\/SIBGRAPI51738.2020.00035"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"012071","DOI":"10.1088\/1742-6596\/1871\/1\/012071","article-title":"A Graph Neural Network for Superpixel Image Classification","volume":"1871","author":"Long","year":"2021","journal-title":"J. Phys. Conf. Ser."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Qi, X., Liao, R., Jia, J., Fidler, S., and Urtasun, R. (2017, January 22\u201329). 3d graph neural networks for rgbd semantic segmentation. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.556"},{"key":"ref_48","first-page":"1","article-title":"Dynamic graph cnn for learning on point clouds","volume":"38","author":"Wang","year":"2019","journal-title":"ACM Trans. Graph. (TOG)"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Landrieu, L., and Simonovsky, M. (2018, January 18\u201323). Large-scale point cloud semantic segmentation with superpoint graphs. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00479"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"2274","DOI":"10.1109\/TPAMI.2012.120","article-title":"SLIC superpixels compared to state-of-the-art superpixel methods","volume":"34","author":"Achanta","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_51","unstructured":"Rottensteiner, F., Sohn, G., Gerke, M., and Wegner, J.D. (2014). ISPRS Semantic Labeling Contest, ISPRS."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., and Garcia-Rodriguez, J. (2017). A review on deep learning techniques applied to semantic segmentation. arXiv.","DOI":"10.1016\/j.asoc.2018.05.018"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015, January 7\u201313). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.123"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Pan, X., Shi, J., Luo, P., Wang, X., and Tang, X. (2018, January 2\u20137). Spatial as deep: Spatial cnn for traffic scene understanding. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.12301"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1016\/j.isprsjprs.2018.01.021","article-title":"Land cover mapping at very high resolution with rotation equivariant CNNs: Towards small yet accurate models","volume":"145","author":"Marcos","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Zhang, M., Hu, X., Zhao, L., Lv, Y., Luo, M., and Pang, S. (2017). Learning dual multi-scale manifold ranking for semantic segmentation of high-resolution images. Remote Sens., 9.","DOI":"10.20944\/preprints201704.0061.v1"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Atik, S.O., and Ipbuker, C. (2021). Integrating Convolutional Neural Network and Multiresolution Segmentation for Land Cover and Land Use Mapping Using Satellite Imagery. Appl. Sci., 11.","DOI":"10.3390\/app11125551"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/2\/305\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T14:14:46Z","timestamp":1760364886000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/2\/305"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,10]]},"references-count":57,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,1]]}},"alternative-id":["rs14020305"],"URL":"https:\/\/doi.org\/10.3390\/rs14020305","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,10]]}}}