{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T18:32:09Z","timestamp":1773772329711,"version":"3.50.1"},"reference-count":60,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2022,4,24]],"date-time":"2022-04-24T00:00:00Z","timestamp":1650758400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Remote sensing image scene classification is an important task of remote sensing image interpretation, which has recently been well addressed by the convolutional neural network owing to its powerful learning ability. However, due to the multiple types of geographical information and redundant background information of the remote sensing images, most of the CNN-based methods, especially those based on a single CNN model and those ignoring the combination of global and local features, exhibit limited performance on accurate classification. To compensate for such insufficiency, we propose a new dual-model deep feature fusion method based on an attention cascade global\u2013local network (ACGLNet). Specifically, we use two popular CNNs as the feature extractors to extract complementary multiscale features from the input image. Considering the characteristics of the global and local features, the proposed ACGLNet filters the redundant background information from the low-level features through the spatial attention mechanism, followed by which the locally attended features are fused with the high-level features. Then, bilinear fusion is employed to produce the fused representation of the dual model, which is finally fed to the classifier. Through extensive experiments on four public remote sensing scene datasets, including UCM, AID, PatternNet, and OPTIMAL-31, we demonstrate the feasibility of the proposed method and its superiority over the state-of-the-art scene classification methods.<\/jats:p>","DOI":"10.3390\/rs14092042","type":"journal-article","created":{"date-parts":[[2022,4,24]],"date-time":"2022-04-24T22:22:41Z","timestamp":1650838961000},"page":"2042","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":34,"title":["An Attention Cascade Global\u2013Local Network for Remote Sensing Scene Classification"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6563-9206","authenticated-orcid":false,"given":"Junge","family":"Shen","sequence":"first","affiliation":[{"name":"Unmanned System Research Institute, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"given":"Tianwei","family":"Yu","sequence":"additional","affiliation":[{"name":"Unmanned System Research Institute, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5137-1544","authenticated-orcid":false,"given":"Haopeng","family":"Yang","sequence":"additional","affiliation":[{"name":"Unmanned System Research Institute, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2730-9409","authenticated-orcid":false,"given":"Ruxin","family":"Wang","sequence":"additional","affiliation":[{"name":"Engineering Research Center of Cyberspace, School of Software, Yunnan University, Kunming 650106, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7028-4956","authenticated-orcid":false,"given":"Qi","family":"Wang","sequence":"additional","affiliation":[{"name":"Unmanned System Research Institute, Northwestern Polytechnical University, Xi\u2019an 710072, China"},{"name":"School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi\u2019an 710072, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,4,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Chen, L., Li, S., Bai, Q., Yang, J., Jiang, S., and Miao, Y. (2021). Review of Image Classification Algorithms Based on Convolutional Neural Networks. Remote Sens., 13.","DOI":"10.3390\/rs13224712"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1007\/BF00130487","article-title":"Color indexing","volume":"7","author":"Swain","year":"1991","journal-title":"Int. J. Comput. Vis."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Wang, J., Yang, Y.Y., Mao, J., Huang, Z., Huang, C., and Xu, W. (2016, January 27\u201330). CNN-RNN: A unified framework for multi-label image classification. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.251"},{"key":"ref_5","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20138). Imagenet classification with deep convolutional neural networks. Proceedings of the Neural Information Processing Systems Conference and Workshop, Lake Tahoe, NV, USA."},{"key":"ref_6","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"8639367","DOI":"10.1155\/2018\/8639367","article-title":"A two-stream deep fusion framework for high-resolution aerial scene classification","volume":"2018","author":"Yu","year":"2018","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"67200","DOI":"10.1109\/ACCESS.2019.2918732","article-title":"Global-local attention network for aerial scene classification","volume":"7","author":"Guo","year":"2019","journal-title":"IEEE Access"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Shen, J., Zhang, C., Zheng, Y., and Wang, R. (2021). Decision-Level Fusion with a Pluginable Importance Factor Generator for Remote Sensing Image Scene Classification. Remote Sens., 13.","DOI":"10.3390\/rs13183579"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Shen, J., Zhang, T., Wang, Y., Wang, R., and Wang, Q. (2021). A Dual-Model Architecture with Grouping-Attention-Fusion for Remote Sensing Scene Classification. Remote Sens., 13.","DOI":"10.3390\/rs13030433"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"639","DOI":"10.1049\/iet-cvi.2014.0270","article-title":"Auto-encoder-based shared mid-level visual dictionary learning for scene classification using very high resolution remote sensing images","volume":"9","author":"Cheng","year":"2015","journal-title":"IET Comput. Vis."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"4775","DOI":"10.1109\/TGRS.2017.2700322","article-title":"Deep feature fusion for VHR remote sensing scene classification","volume":"55","author":"Chaib","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"780","DOI":"10.1109\/36.752194","article-title":"Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices","volume":"37","author":"Soh","year":"1999","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1865","DOI":"10.1109\/JPROC.2017.2675998","article-title":"Remote sensing image scene classification: Benchmark and state-of-the-art","volume":"105","author":"Cheng","year":"2017","journal-title":"Proc. IEEE"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1016\/S0031-3203(96)00068-4","article-title":"Object detection using gabor filters","volume":"30","author":"Jain","year":"1997","journal-title":"Pattern Recognit."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1016\/j.ins.2016.02.021","article-title":"Scene classification using local and global features with collaborative representation fusion","volume":"348","author":"Zou","year":"2016","journal-title":"Inf. Sci."},{"key":"ref_18","first-page":"747","article-title":"Reducing the dimensionality of data with neural networks","volume":"13","author":"Hinton","year":"2016","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1527","DOI":"10.1162\/neco.2006.18.7.1527","article-title":"A fast learning algorithm for deep belief nets","volume":"18","author":"Hinton","year":"2014","journal-title":"Neural Comput."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"Lecun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"3632943","DOI":"10.1155\/2016\/3632943","article-title":"Stacked Denoise autoencoder based feature extraction and classification for hyperspectral images","volume":"2016","author":"Xing","year":"2016","journal-title":"J. Sens."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"818","DOI":"10.1109\/TGRS.2012.2205158","article-title":"Geographic image retrieval using local invariant features","volume":"51","author":"Yang","year":"2013","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zhou, Z., Zheng, Y., and Ye, H. (2018, January 21\u201322). Satellite image scene classification via convNet with context aggregation. Proceedings of the 19th Pacific-Rim Conference on Multimedia, Hefei, China.","DOI":"10.1007\/978-3-030-00767-6_31"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1779","DOI":"10.1109\/TGRS.2018.2869101","article-title":"Remote sensing image scene classification using rearranged local features","volume":"57","author":"Yuan","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_25","unstructured":"Castelluccio, M., Poggi, G., Sansone, C., and Verdoliva, L. (2015). Land use classification in remote sensing images by convolutional neural networks. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1016\/j.patcog.2016.07.001","article-title":"Towards better exploiting convolutional neural networks for remote sensing scene classification","volume":"61","author":"Nogueira","year":"2017","journal-title":"Pattern Recognit."},{"key":"ref_27","first-page":"23","article-title":"A semi-supervised generative framework with deep learning features for high-resolution remote sensing image scene classification","volume":"145","author":"Han","year":"2018","journal-title":"Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"14680","DOI":"10.3390\/rs71114680","article-title":"Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery","volume":"7","author":"Hu","year":"2015","journal-title":"Remote Sens."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1109\/LGRS.2017.2779469","article-title":"Scene classification based on two-stage deep feature fusion","volume":"15","author":"Liu","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1080\/2150704X.2017.1415477","article-title":"Parallel multi-stage features fusion of deep convolutional neural networks for aerial scene classification","volume":"9","author":"Ye","year":"2018","journal-title":"Remote Sens. Lett."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1109\/LGRS.2017.2786241","article-title":"Aerial scene classification via multilevel fusion based on deep convolutional neural networks","volume":"15","author":"Yu","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"9295","DOI":"10.1007\/s00521-019-04281-y","article-title":"Context-Aware Attention Network for Image Recognition","volume":"31","author":"Leng","year":"2019","journal-title":"Neural Comput. Appl."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Wu, X., Zhang, Z., Zhang, W., Yi, Y., Zhang, C., and Xu, Q. (2021). A convolutional neural network based on grouping structure for scene classification. Remote Sens., 13.","DOI":"10.3390\/rs13132457"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Shi, C., Zhao, X., and Wang, L. (2021). A multi-branch feature fusion strategy based on an attention mechanism for remote sensing image scene classification. Remote Sens., 13.","DOI":"10.3390\/rs13101950"},{"key":"ref_35","unstructured":"Jaderberg, M., Simonyan, K., and Zisserman, A. (2015, January 7\u201312). Spatial transformer networks. Proceedings of the Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., and Tang, X. (2017, January 21\u201326). Residual attention network for image classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.683"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1155","DOI":"10.1109\/TGRS.2018.2864987","article-title":"Scene classification with recurrent attention of VHR remote sensing images","volume":"57","author":"Wang","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"4121","DOI":"10.1109\/JSTARS.2020.3009352","article-title":"Channel-attention-based denseNet network for remote sensing image scene classification","volume":"13","author":"Tong","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., van der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"3862","DOI":"10.1109\/JSTARS.2020.3006241","article-title":"An augmentation attention mechanism for high-spatial-resolution remote sensing image scene classification","volume":"13","author":"Li","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"6344","DOI":"10.1109\/ACCESS.2019.2963769","article-title":"Scene classification of remote sensing images based on saliency dual attention residual network","volume":"8","author":"Guo","year":"2020","journal-title":"IEEE Access"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Fan, R., Wang, L., Feng, R., and Zhou, Y. (August, January 28). Attention based residual network for high-resolution remote sensing imagery scene classification. Proceedings of the IGARSS 2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan.","DOI":"10.1109\/IGARSS.2019.8900199"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Zhao, T., and Wu, X. (2019, January 16\u201320). Pyramid feature attention network for saliency detection. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition CVPR 2019, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00320"},{"key":"ref_46","unstructured":"Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. (2015, January 7\u201313). Bilinear CNN models for fine-grained visual recognition. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Gao, Y., Beijbom, O., Zhang, N., and Darrell, T. (2016, January 27\u201330). Compact bilinear pooling. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.41"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Pham, N., and Pagh, R. (2013, January 11\u201314). Fast and scalable polynomial kernels via explieit feature maps. Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA.","DOI":"10.1145\/2487575.2487591"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Yang, Y., and Newsam, S. (2010, January 2\u20135). Bag-of-visual-words and spatial extensions for land-use classification. Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA.","DOI":"10.1145\/1869790.1869829"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"3965","DOI":"10.1109\/TGRS.2017.2685945","article-title":"AID: A benchmark data set for performance evaluation of aerial scene classification","volume":"55","author":"Xia","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1016\/j.isprsjprs.2018.01.004","article-title":"PatternNet: A benchmark dataset for performance evaluation of remote sensing image retrieval","volume":"145","author":"Zhou","year":"2018","journal-title":"ISPRS J. Photogram. Remote Sens."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Zeng, D., Chen, S., Chen, B., and Li, S. (2018). Improving remote sensing scene classification by integrating global-context and local-object features. Remote Sens., 10.","DOI":"10.3390\/rs10050734"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"745","DOI":"10.1007\/s11760-015-0804-2","article-title":"Land-use scene classification using multi-scale completed local binary patterns","volume":"10","author":"Chen","year":"2015","journal-title":"Signal Image Video Process."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"2149","DOI":"10.1080\/01431161.2016.1171928","article-title":"Using convolutional features and a sparse autoencoder for land-use scene classification","volume":"37","author":"Othman","year":"2016","journal-title":"Int. J. Remote Sens."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Zhang, W., Tang, P., and Zhao, L. (2019). Remote sensing image scene classification using CNN-CapsNet. Remote Sens., 11.","DOI":"10.3390\/rs11050494"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1016\/j.isprsjprs.2018.01.023","article-title":"Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification","volume":"138","author":"Anwer","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"7918","DOI":"10.1109\/TGRS.2020.3044655","article-title":"Enhanced feature pyramid network with deep semantic embedding for remote sensing scene classification","volume":"59","author":"Wang","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Shafaey, M.A., Salem, M.A.M., Ebeid, H.M., Al-Berry, M.N., and Tolba, M.F. (2018, January 18\u201319). Comparison of CNNs for remote sensing scene classification. Proceedings of the 2018 13th International Conference on Computer Engineering and Systems (ICCES), Cairo, Egypt.","DOI":"10.1109\/ICCES.2018.8639467"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"675","DOI":"10.26483\/ijarcs.v9i2.5897","article-title":"Effect of texture feature combination on satellite image classification","volume":"9","author":"Altaei","year":"2018","journal-title":"Int. J. Adv. Res. Comput. Sci."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Tian, Q., Wan, S., Jin, P., Xu, J., Zou, C., and Li, X. (2018, January 21\u201322). A novel feature fusion with self-adaptive weight method based on deep learning for image classification. Proceedings of the 19th Pacific-Rim Conference on Multimedia, Hefei, China.","DOI":"10.1007\/978-3-030-00776-8_39"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/9\/2042\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:59:58Z","timestamp":1760137198000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/9\/2042"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,24]]},"references-count":60,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2022,5]]}},"alternative-id":["rs14092042"],"URL":"https:\/\/doi.org\/10.3390\/rs14092042","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,24]]}}}