{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T02:17:30Z","timestamp":1760235450320,"version":"build-2065373602"},"reference-count":49,"publisher":"MDPI AG","issue":"16","license":[{"start":{"date-parts":[[2021,8,19]],"date-time":"2021-08-19T00:00:00Z","timestamp":1629331200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>With the improvement of the quality and resolution of remote sensing (RS) images, scene recognition tasks have played an important role in the RS community. However, due to the special bird\u2019s eye view image acquisition mode of imaging sensors, it is still challenging to construct a discriminate representation of diverse and complex scenes to improve RS image recognition performance. Capsule networks that can learn the spatial relationship between the features in an image has a good image classification performance. However, the original capsule network is not suitable for images with a complex background. To address the above issues, this paper proposes a novel end-to-end capsule network termed DS-CapsNet, in which a new multi-scale feature enhancement module and a new Caps-SoftPool method are advanced by aggregating the advantageous attributes of the residual convolution architecture, Diverse Branch Block (DBB), Squeeze and Excitation (SE) block, and the Caps-SoftPool method. By using the residual DBB, multiscale features can be extracted and fused to recover a semantic strong feature representation. By adopting SE, the informative features are emphasized, and the less salient features are weakened. The new Caps-SoftPool method can reduce the number of parameters that are needed in order to prevent an over-fitting problem. The novel DS-CapsNet achieves a competitive and promising performance for RS image recognition by using high-quality and robust capsule representation. The extensive experiments on two challenging datasets, AID and NWPU-RESISC45, demonstrate the robustness and superiority of the proposed DS-CapsNet in scene recognition tasks.<\/jats:p>","DOI":"10.3390\/s21165575","type":"journal-article","created":{"date-parts":[[2021,8,19]],"date-time":"2021-08-19T04:13:54Z","timestamp":1629346434000},"page":"5575","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Scene Recognition Using Deep Softpool Capsule Network Based on Residual Diverse Branch Block"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1361-2893","authenticated-orcid":false,"given":"Chunyuan","family":"Wang","sequence":"first","affiliation":[{"name":"School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China"}]},{"given":"Yang","family":"Wu","sequence":"additional","affiliation":[{"name":"Shanghai Institute of Satellite Engineering, Shanghai 200240, China"}]},{"given":"Yihan","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5256-2676","authenticated-orcid":false,"given":"Yiping","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,8,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Ren, Y., Yu, Y., and Guan, H. (2020). DA-CapsUNet: A Dual-Attention Capsule U-Net for Road Extraction from Remote Sensing Imagery. Remote Sens., 12.","DOI":"10.3390\/rs12182866"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1109\/TGRS.2013.2241444","article-title":"Unsupervised feature learning for aerial scene classification","volume":"52","author":"Cheriyadat","year":"2014","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"8588","DOI":"10.1080\/01431161.2013.845925","article-title":"Extreme value theory-based calibration for the fusion of multiple features in high-resolution satellite scene classification","volume":"34","author":"Shao","year":"2013","journal-title":"Int. J. Remote Sens."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1080\/2150704X.2017.1422873","article-title":"An object-based supervised classification framework for very-high-resolution remote sensing images using convolutional neural networks","volume":"9","author":"Zhang","year":"2018","journal-title":"Remote Sens. Lett."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Petrovska, B., Atanasovapacemska, T., Corizzo, R., Mignone, P., Lameski, P., and Zdravevski, E. (2020). Aerial Scene Classification through Fine-Tuning with Adaptive Learning Rates and Label Smoothing. Appl. Sci., 10.","DOI":"10.3390\/app10175792"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Wang, C., Liu, X., Zhao, X., and Wang, Y. (2016). An Effective Correction Method for Seriously Oblique Remote Sensing Images Based on Multi-View Simulation and a Piecewise Model. Sensors, 16.","DOI":"10.3390\/s16101725"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2811","DOI":"10.1109\/TGRS.2017.2783902","article-title":"When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs","volume":"56","author":"Cheng","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"7109","DOI":"10.1109\/TGRS.2018.2848473","article-title":"Scene classification based on multiscale convolutional neural network","volume":"56","author":"Liu","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"2030","DOI":"10.1109\/JSTARS.2021.3051569","article-title":"Attention Consistent Network for Remote Sensing Scene Classification","volume":"14","author":"Tang","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1109\/TGRS.2019.2931801","article-title":"Remote Sensing Scene Classification by Gated Bidirectional Network","volume":"58","author":"Sun","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1453","DOI":"10.1080\/01431160903475266","article-title":"Object oriented classification of high-resolution remote sensing imagery based on an improved colour structure code and a support vector machine","volume":"31","author":"Li","year":"2010","journal-title":"Int. J. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1899","DOI":"10.1109\/JSTARS.2012.2228254","article-title":"Indexing of remote sensing images with different resolutions by multiple features","volume":"6","author":"Luo","year":"2013","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"745","DOI":"10.1007\/s11760-015-0804-2","article-title":"Land-use scene classification using multi-scale completed local binary patterns","volume":"10","author":"Chen","year":"2016","journal-title":"Signal. Image Video Process."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1016\/j.isprsjprs.2019.04.015","article-title":"Deep learning in remote sensing applications: A meta-analysis and review","volume":"152","author":"Ma","year":"2019","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"3735","DOI":"10.1109\/JSTARS.2020.3005403","article-title":"Remote sensing image scene classification meets deep learning: Challenges, methods, benchmarks, and opportunities","volume":"13","author":"Cheng","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Abdollahi, A., Pradhan, B., Shukla, N., Chakraborty, S., and Alamri, A. (2020). Deep learning approaches applied to remote sensing datasets for road extraction: A state-of-the-art review. Remote Sens., 12.","DOI":"10.3390\/rs12091444"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Liu, Y., Zhong, Y., Fei, F., Zhu, Q., and Qin, Q. (2018). Scene Classification Based on a Deep Random-Scale Stretched Convolutional Neural Network. Remote Sens., 10.","DOI":"10.3390\/rs10030444"},{"key":"ref_18","unstructured":"Castelluccio, M., Poggi, G., Sansone, C., and Verdoliva, L. (2015). Land use classification in remote sensing images by convolutional neural networks. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1016\/j.patcog.2016.07.001","article-title":"Towards better exploiting convolutional neural networks for remote sensing scene classification","volume":"61","author":"Nogueira","year":"2017","journal-title":"Pattern Recognit."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"9667","DOI":"10.1007\/s11042-018-6548-6","article-title":"Analysis of the inter-dataset representation ability of deep features for high spatial resolution remote sensing image scene classification","volume":"78","author":"Zhao","year":"2019","journal-title":"Multimed. Tools Appl."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Hinton, G.E., Krizhevsky, A., and Wang, S.D. (2011). Transforming auto- encoders. International Conference on Artificial Neural Networks, Springer.","DOI":"10.1007\/978-3-642-21735-7_6"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Afshar, P., Mohammadi, A., and Plataniotis, K.N. (2018). Brain Tumor Type Classification via Capsule Networks. arXiv.","DOI":"10.1109\/ICIP.2018.8451379"},{"key":"ref_23","unstructured":"LaLonde, R., and Bagci, U. (2018). Capsules for object segmentation. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Deng, F., Pu, S., Chen, X., Shi, Y., Yuan, T., and Pu, S. (2018). Hyperspectral Image Classification with Capsule Network Using Limited Training Samples. Sensors, 18.","DOI":"10.3390\/s18093153"},{"key":"ref_25","unstructured":"Zhao, W., Ye, J., Yang, M., Lei, Z., Zhang, S., and Zhao, Z. (2018). Investigating capsule networks with dynamic routing for text classification. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1850","DOI":"10.1109\/LSP.2018.2873892","article-title":"MS-CapsNet: A novel multi-scale capsule network","volume":"25","author":"Xiang","year":"2018","journal-title":"IEEE Signal. Process. Lett."},{"key":"ref_27","unstructured":"Sabour, S., Frosst, N., and Hinton, G.E. (2017). Dynamic routing between capsules. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_29","first-page":"1106","article-title":"ImageNet classification with deep convolutional neural networks","volume":"25","author":"Krizhevsky","year":"2012","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Ding, X., Zhang, X., Han, J., and Ding, G. (2021, January 18\u201320). Diverse Branch Block: Building a Convolution as an Inception-like Unit. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Kuala Lumpur, Malaysia.","DOI":"10.1109\/CVPR46437.2021.01074"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"2011","DOI":"10.1109\/TPAMI.2019.2913372","article-title":"Squeeze-and-Excitation Networks","volume":"42","author":"Hu","year":"2020","journal-title":"IEEE Trans. Pattern Anal."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1254","DOI":"10.1109\/34.730558","article-title":"A model of saliency-based visual attention for rapid scene analysis","volume":"20","author":"Itti","year":"1998","journal-title":"IEEE Trans. Pattern Anal."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Cao, C., Liu, X., Yang, Y., Yu, Y., Wang, J., Wang, Z., Huang, Y., Wang, L., Huang, C., and Xu, W. (2015, January 7\u201313). Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks (Conference Paper). Proceedings of the IEEE International Conference on Computer Vision 2015, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.338"},{"key":"ref_34","first-page":"2204","article-title":"Recurrent Models of Visual Attention","volume":"27","author":"Mnih","year":"2014","journal-title":"Adv. Neural. Inf. Process. Syst."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Stergiou, A., Poppe, R., and Kalliatakis, G. (2021). Refining Activation Downsampling with SoftPool. arXiv.","DOI":"10.1109\/ICCV48922.2021.01019"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"3965","DOI":"10.1109\/TGRS.2017.2685945","article-title":"AID: A benchmark data set for performance evaluation of aerial scene classification","volume":"55","author":"Xia","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"2889","DOI":"10.1109\/JSTARS.2017.2683799","article-title":"Fusing local and global features for high-resolution scene classification","volume":"10","author":"Bian","year":"2017","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"4775","DOI":"10.1109\/TGRS.2017.2700322","article-title":"Deep feature fusion for VHR remote sensing scene classification","volume":"55","author":"Chaib","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Anwer, R.M., Khan, F.S., van de Weijer, J., Monlinier, M., and Laaksonen, J. (2017). Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification. arXiv.","DOI":"10.1016\/j.isprsjprs.2018.01.023"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1109\/LGRS.2017.2786241","article-title":"Aerial scene classification via multilevel fusion based on deep convolutional neural networks","volume":"15","author":"Yu","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1894","DOI":"10.1109\/LGRS.2019.2960026","article-title":"Multilayer Feature Fusion Network for Scene classification in Remote Sensing","volume":"17","author":"Xu","year":"2020","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1200","DOI":"10.1109\/LGRS.2019.2894399","article-title":"Siamese convolutional neural networks for remote sensing scene classification","volume":"16","author":"Liu","year":"2019","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Fan, R., Wang, L., Feng, R., and Zhu, Y. (August, January 28). Attention based residual network for high-resolution remote sensing imagery scene classification. Proceedings of the IGARSS 2019\u20142019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan.","DOI":"10.1109\/IGARSS.2019.8900199"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"494","DOI":"10.3390\/rs11050494","article-title":"Remote Sensing Image Scene Classification Using CNN-CapsNet","volume":"11","author":"Wei","year":"2019","journal-title":"Remote Sens."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1865","DOI":"10.1109\/JPROC.2017.2675998","article-title":"Remote sensing image scene classification: Benchmark and state of the art","volume":"105","author":"Cheng","year":"2017","journal-title":"Proc. IEEE"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"8639367","DOI":"10.1155\/2018\/8639367","article-title":"A two-stream deep fusion framework for high-resolution aerial scene classification","volume":"2018","author":"Yu","year":"2018","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1109\/LGRS.2017.2731997","article-title":"Remote Sensing Image Scene Classification Using Bag of Convolutional Features","volume":"14","author":"Cheng","year":"2017","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Yu, Y., and Liu, F. (2018). Dense connectivity based two-stream deep feature fusion framework for aerial scene classification. Remote Sens., 10.","DOI":"10.3390\/rs10071158"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"6899","DOI":"10.1109\/TGRS.2018.2845668","article-title":"Remote sensing scene classification using multilayer stacked covariance pooling","volume":"56","author":"He","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/16\/5575\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:47:00Z","timestamp":1760165220000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/16\/5575"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,19]]},"references-count":49,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2021,8]]}},"alternative-id":["s21165575"],"URL":"https:\/\/doi.org\/10.3390\/s21165575","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2021,8,19]]}}}