{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T14:32:46Z","timestamp":1775745166567,"version":"3.50.1"},"reference-count":59,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2024,5,29]],"date-time":"2024-05-29T00:00:00Z","timestamp":1716940800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Super-resolution (SR) for satellite remote sensing images has been recognized as crucial and has found widespread applications across various scenarios. Previous SR methods were usually built upon Convolutional Neural Networks and Transformers, which suffer from either limited receptive fields or a lack of prior assumptions. To address these issues, we propose ESatSR, a novel SR method based on state space models. We utilize the 2D Selective Scan to obtain an enhanced capability in modeling long-range dependencies, which contributes to a wide receptive field. A Spatial Context Interaction Module (SCIM) and an Enhanced Image Reconstruction Module (EIRM) are introduced to combine image-related prior knowledge into our model, therefore guiding the process of feature extraction and reconstruction. Tailored for remote sensing images, the interaction of multi-scale spatial context and image features is leveraged to enhance the network\u2019s capability in capturing features of small targets. Comprehensive experiments show that ESatSR demonstrates state-of-the-art performance on both OLI2MSI and RSSCN7 datasets, with the highest PSNRs of 42.11 dB and 31.42 dB, respectively. Extensive ablation studies illustrate the effectiveness of our module design.<\/jats:p>","DOI":"10.3390\/rs16111956","type":"journal-article","created":{"date-parts":[[2024,5,30]],"date-time":"2024-05-30T03:45:08Z","timestamp":1717040708000},"page":"1956","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["ESatSR: Enhancing Super-Resolution for Satellite Remote Sensing Images with State Space Model and Spatial Context"],"prefix":"10.3390","volume":"16","author":[{"given":"Yinxiao","family":"Wang","sequence":"first","affiliation":[{"name":"School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China"},{"name":"Innovation Academy for Microsatellites, Chinese Academy of Sciences, Shanghai 201210, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-0165-6071","authenticated-orcid":false,"given":"Wei","family":"Yuan","sequence":"additional","affiliation":[{"name":"College of Biomedical Engineering, Sichuan University, Chengdu 610065, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9354-6093","authenticated-orcid":false,"given":"Fang","family":"Xie","sequence":"additional","affiliation":[{"name":"Innovation Academy for Microsatellites, Chinese Academy of Sciences, Shanghai 201210, China"},{"name":"School of Optoelectronics, University of Chinese Academy of Sciences, Beijing 100094, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Baojun","family":"Lin","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China"},{"name":"Innovation Academy for Microsatellites, Chinese Academy of Sciences, Shanghai 201210, China"},{"name":"School of Optoelectronics, University of Chinese Academy of Sciences, Beijing 100094, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,5,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"9012","DOI":"10.1109\/JSTARS.2021.3108777","article-title":"Unsupervised hyperspectral image change detection via deep learning self-generated credible labels","volume":"14","author":"Li","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_2","first-page":"1","article-title":"A spectral and spatial attention network for change detection in hyperspectral images","volume":"60","author":"Gong","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"116793","DOI":"10.1016\/j.eswa.2022.116793","article-title":"Remote sensing image super-resolution and object detection: Benchmark and state of the art","volume":"197","author":"Wang","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Gong, H., Mu, T., Li, Q., Dai, H., Li, C., He, Z., Wang, W., Han, F., Tuniyazi, A., and Li, H. (2022). Swin-transformer-enabled YOLOv5 with attention mechanism for small object detection on satellite images. Remote Sens., 14.","DOI":"10.3390\/rs14122861"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1697","DOI":"10.1007\/s11554-021-01113-y","article-title":"Image deconvolution for optical small satellite with deep learning and real-time GPU acceleration","volume":"18","author":"Ngo","year":"2021","journal-title":"J. Real-Time Image Process."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TGRS.2024.3404062","article-title":"Robust optical and SAR image matching using attention-enhanced structural features","volume":"62","author":"Ye","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_7","first-page":"1","article-title":"Visible\/Infrared Image Registration Based on Region-Adaptive Contextual Multi-Features","volume":"62","author":"Zhao","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1109\/TPAMI.2015.2439281","article-title":"Image super-resolution using deep convolutional networks","volume":"38","author":"Dong","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ledig, C., Theis, L., Husz\u00e1r, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., and Wang, Z. (2017, January 21\u201326). Photo-realistic single image super-resolution using a generative adversarial network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.19"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Kim, J., Lee, J.K., and Lee, K.M. (2016, January 27\u201330). Accurate image super-resolution using very deep convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.182"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhang, K., Zuo, W., Gu, S., and Zhang, L. (2017, January 21\u201326). Learning deep CNN denoiser prior for image restoration. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.300"},{"key":"ref_12","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_13","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2022, January 23\u201327). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Chen, H., Wang, Y., Guo, T., Xu, C., Deng, Y., Liu, Z., Ma, S., Xu, C., Xu, C., and Gao, W. (2021, January 20\u201325). Pre-trained image processing transformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01212"},{"key":"ref_15","unstructured":"Cao, J., Li, Y., Zhang, K., Liang, J., and Van Gool, L. (2021). Video super-resolution transformer. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., and Timofte, R. (2021, January 10\u201317). Swinir: Image restoration using swin transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00210"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1007\/s41095-022-0274-8","article-title":"Pvt v2: Improved baselines with pyramid vision transformer","volume":"8","author":"Wang","year":"2022","journal-title":"Comput. Vis. Media"},{"key":"ref_18","first-page":"12077","article-title":"SegFormer: Simple and efficient design for semantic segmentation with transformers","volume":"34","author":"Xie","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Chen, X., Wang, X., Zhou, J., Qiao, Y., and Dong, C. (2023, January 17\u201324). Activating more pixels in image super-resolution transformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.02142"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., and Fu, Y. (2018, January 8\u201314). Image super-resolution using very deep residual channel attention networks. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_18"},{"key":"ref_21","unstructured":"Liu, Y., Tian, Y., Zhao, Y., Yu, H., Xie, L., Wang, Y., Ye, Q., and Liu, Y. (2024). Vmamba: Visual state space model. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Dai, T., Cai, J., Zhang, Y., Xia, S.T., and Zhang, L. (2019, January 15\u201320). Second-order attention network for single image super-resolution. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01132"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Niu, B., Wen, W., Ren, W., Zhang, X., Yang, L., Wang, S., Zhang, K., Cao, X., and Shen, H. (2020, January 23\u201328). Single image super-resolution via a holistic attention network. Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK. Part XII 16.","DOI":"10.1007\/978-3-030-58610-2_12"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Mei, Y., Fan, Y., and Zhou, Y. (2021, January 19\u201325). Image super-resolution with non-local sparse attention. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtually.","DOI":"10.1109\/CVPR46437.2021.00352"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Lim, B., Son, S., Kim, H., Nah, S., and Mu Lee, K. (2017, January 21\u201326). Enhanced deep residual networks for single image super-resolution. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.151"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Tian, Y., Kong, Y., Zhong, B., and Fu, Y. (2018, January 18\u201323). Residual dense network for image super-resolution. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00262"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Kim, J., Lee, J.K., and Lee, K.M. (2016, January 27\u201330). Deeply-recursive convolutional network for image super-resolution. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.181"},{"key":"ref_28","first-page":"3499","article-title":"Cross-scale internal graph neural network for image super-resolution","volume":"33","author":"Zhou","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., and Change Loy, C. (2019, January 8\u201314). Esrgan: Enhanced super-resolution generative adversarial networks. Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany.","DOI":"10.1007\/978-3-030-11021-5_5"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wang, X., Xie, L., Dong, C., and Shan, Y. (2021, January 11\u201317). Real-esrgan: Training real-world blind super-resolution with pure synthetic data. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00217"},{"key":"ref_31","unstructured":"Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and J\u00e9gou, H. (2021, January 18\u201324). Training data-efficient image transformers & distillation through attention. Proceedings of the International Conference on Machine Learning. PMLR, Virtual Event."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., and Wang, M. (2022, January 23). Swin-unet: Unet-like pure transformer for medical image segmentation. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25066-8_9"},{"key":"ref_33","unstructured":"Wu, B., Xu, C., Dai, X., Wan, A., Zhang, P., Yan, Z., Tomizuka, M., Gonzalez, J., Keutzer, K., and Vajda, P. (2020). Visual transformers: Token-based image representation and processing for computer vision. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., and Torr, P.H. (2021, January 19\u201325). Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtually.","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 10\u201317). Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1243","DOI":"10.1109\/LGRS.2017.2704122","article-title":"Super-resolution for remote sensing images via local\u2013global combined network","volume":"14","author":"Lei","year":"2017","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_37","first-page":"1","article-title":"Contextual transformation network for lightweight remote-sensing image super-resolution","volume":"60","author":"Wang","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Gu, J., Sun, X., Zhang, Y., Fu, K., and Wang, L. (2019). Deep residual squeeze and excitation network for remote sensing image super-resolution. Remote Sens., 11.","DOI":"10.3390\/rs11151817"},{"key":"ref_39","first-page":"1","article-title":"FeNet: Feature enhancement network for lightweight remote-sensing image super-resolution","volume":"60","author":"Wang","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_40","first-page":"1","article-title":"Hybrid-scale self-similarity exploitation for remote sensing image super-resolution","volume":"60","author":"Lei","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_41","first-page":"1","article-title":"Transformer-based multistage enhancement for remote sensing image super-resolution","volume":"60","author":"Lei","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1115\/1.3662552","article-title":"A new approach to linear filtering and prediction problems","volume":"82","author":"Kalman","year":"1960","journal-title":"Trans. ASME D"},{"key":"ref_43","unstructured":"Gu, A., Goel, K., and R\u00e9, C. (2021). Efficiently modeling long sequences with structured state spaces. arXiv."},{"key":"ref_44","first-page":"572","article-title":"Combining recurrent, convolutional, and continuous-time models with linear state space layers","volume":"34","author":"Gu","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_45","first-page":"22982","article-title":"Diagonal state spaces are as effective as structured state spaces","volume":"35","author":"Gupta","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_46","unstructured":"Li, Y., Cai, T., Zhang, Y., Chen, D., and Dey, D. (2022). What makes convolutional models great on long sequence modeling?. arXiv."},{"key":"ref_47","unstructured":"Gu, A., and Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces. arXiv."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., and Zhang, L. (2021, January 11\u201317). Cvt: Introducing convolutions to vision transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00009"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Ghiasi, G., Lin, T.Y., and Le, Q.V. (2019, January 15\u201320). Nas-fpn: Learning scalable feature pyramid architecture for object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00720"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Agustsson, E., and Timofte, R. (2017, January 21\u201326). Ntire 2017 challenge on single image super-resolution: Dataset and study. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.150"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"9829706","DOI":"10.34133\/2021\/9829706","article-title":"Multisensor remote sensing imagery super-resolution with conditional GAN","volume":"2021","author":"Wang","year":"2021","journal-title":"J. Remote Sens."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"2321","DOI":"10.1109\/LGRS.2015.2475299","article-title":"Deep Learning Based Feature Selection for Remote Sensing Scene Classification","volume":"12","author":"Zou","year":"2015","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Zhang, X., Zeng, H., and Zhang, L. (2021, January 20\u201324). Edge-oriented convolution block for real-time super resolution on mobile devices. Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event.","DOI":"10.1145\/3474085.3475291"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"5799","DOI":"10.1109\/TGRS.2019.2902431","article-title":"Edge-enhanced GAN for remote sensing image superresolution","volume":"57","author":"Jiang","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"5183","DOI":"10.1109\/TGRS.2020.3009918","article-title":"Remote Sensing Image Super-Resolution via Mixed High-Order Attention Network","volume":"59","author":"Zhang","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_56","first-page":"158","article-title":"Pytorch: An imperative style, high-performance deep learning library","volume":"32","author":"Paszke","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_57","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1109\/TIP.2003.819861","article-title":"Image quality assessment: From error visibility to structural similarity","volume":"13","author":"Wang","year":"2004","journal-title":"IEEE Trans. Image Process."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Zhang, R., Isola, P., Efros, A.A., Shechtman, E., and Wang, O. (2018, January 18\u201323). The unreasonable effectiveness of deep features as a perceptual metric. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00068"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/11\/1956\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:50:22Z","timestamp":1760107822000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/11\/1956"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,29]]},"references-count":59,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2024,6]]}},"alternative-id":["rs16111956"],"URL":"https:\/\/doi.org\/10.3390\/rs16111956","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,29]]}}}