{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T02:31:29Z","timestamp":1768617089970,"version":"3.49.0"},"reference-count":50,"publisher":"MDPI AG","issue":"18","license":[{"start":{"date-parts":[[2022,9,11]],"date-time":"2022-09-11T00:00:00Z","timestamp":1662854400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["61972060"],"award-info":[{"award-number":["61972060"]}]},{"name":"National Natural Science Foundation of China","award":["U1713213"],"award-info":[{"award-number":["U1713213"]}]},{"name":"National Natural Science Foundation of China","award":["62027827"],"award-info":[{"award-number":["62027827"]}]},{"name":"National Natural Science Foundation of China","award":["2019YFE0110800"],"award-info":[{"award-number":["2019YFE0110800"]}]},{"name":"National Natural Science Foundation of China","award":["cstc2020jcyj-zdxmX0025"],"award-info":[{"award-number":["cstc2020jcyj-zdxmX0025"]}]},{"name":"National Natural Science Foundation of China","award":["cstc2019cxcyljrc-td0270"],"award-info":[{"award-number":["cstc2019cxcyljrc-td0270"]}]},{"name":"National Key Research and Development Program of China","award":["61972060"],"award-info":[{"award-number":["61972060"]}]},{"name":"National Key Research and Development Program of China","award":["U1713213"],"award-info":[{"award-number":["U1713213"]}]},{"name":"National Key Research and Development Program of China","award":["62027827"],"award-info":[{"award-number":["62027827"]}]},{"name":"National Key Research and Development Program of China","award":["2019YFE0110800"],"award-info":[{"award-number":["2019YFE0110800"]}]},{"name":"National Key Research and Development Program of China","award":["cstc2020jcyj-zdxmX0025"],"award-info":[{"award-number":["cstc2020jcyj-zdxmX0025"]}]},{"name":"National Key Research and Development Program of China","award":["cstc2019cxcyljrc-td0270"],"award-info":[{"award-number":["cstc2019cxcyljrc-td0270"]}]},{"name":"Natural Science Foundation of Chongqing","award":["61972060"],"award-info":[{"award-number":["61972060"]}]},{"name":"Natural Science Foundation of Chongqing","award":["U1713213"],"award-info":[{"award-number":["U1713213"]}]},{"name":"Natural Science Foundation of Chongqing","award":["62027827"],"award-info":[{"award-number":["62027827"]}]},{"name":"Natural Science Foundation of Chongqing","award":["2019YFE0110800"],"award-info":[{"award-number":["2019YFE0110800"]}]},{"name":"Natural Science Foundation of Chongqing","award":["cstc2020jcyj-zdxmX0025"],"award-info":[{"award-number":["cstc2020jcyj-zdxmX0025"]}]},{"name":"Natural Science Foundation of Chongqing","award":["cstc2019cxcyljrc-td0270"],"award-info":[{"award-number":["cstc2019cxcyljrc-td0270"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Remote sensing images with high temporal and spatial resolutions play a crucial role in land surface-change monitoring, vegetation monitoring, and natural disaster mapping. However, existing technical conditions and cost constraints make it very difficult to directly obtain remote sensing images with high temporal and spatial resolution. Consequently, spatiotemporal fusion technology for remote sensing images has attracted considerable attention. In recent years, deep learning-based fusion methods have been developed. In this study, to improve the accuracy and robustness of deep learning models and better extract the spatiotemporal information of remote sensing images, the existing multi-stream remote sensing spatiotemporal fusion network MSNet is improved using dilated convolution and an improved transformer encoder to develop an enhanced version called EMSNet. Dilated convolution is used to extract time information and reduce parameters. The improved transformer encoder is improved to further adapt to image-fusion technology and effectively extract spatiotemporal information. A new weight strategy is used for fusion that substantially improves the prediction accuracy of the model, image quality, and fusion effect. The superiority of the proposed approach is confirmed by comparing it with six representative spatiotemporal fusion algorithms on three disparate datasets. Compared with MSNet, EMSNet improved SSIM by 15.3% on the CIA dataset, ERGAS by 92.1% on the LGC dataset, and RMSE by 92.9% on the AHB dataset.<\/jats:p>","DOI":"10.3390\/rs14184544","type":"journal-article","created":{"date-parts":[[2022,9,13]],"date-time":"2022-09-13T04:05:41Z","timestamp":1663041941000},"page":"4544","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Enhanced Multi-Stream Remote Sensing Spatiotemporal Fusion Network Based on Transformer and Dilated Convolution"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9033-8245","authenticated-orcid":false,"given":"Weisheng","family":"Li","sequence":"first","affiliation":[{"name":"College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8171-7729","authenticated-orcid":false,"given":"Dongwen","family":"Cao","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China"}]},{"given":"Minghao","family":"Xiang","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,9,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1228","DOI":"10.1109\/36.701075","article-title":"The Moderate Resolution Imaging Spectroradiometer (MODIS): Land remote sensing for global change research","volume":"36","author":"Justice","year":"1998","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1016\/j.rse.2014.09.015","article-title":"Using satellite remote sensing data to estimate the high-resolution distribution of ground-level PM2.5","volume":"156","author":"Lin","year":"2015","journal-title":"Remote Sens. Environ."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1109\/TCYB.2016.2605044","article-title":"Simultaneous spectral-spatial feature selection and extraction for hyperspectral images","volume":"48","author":"Zhang","year":"2016","journal-title":"IEEE Trans. Cybern."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"799","DOI":"10.14358\/PERS.72.7.799","article-title":"Object-based detailed vegetation classification with airborne high spatial resolution remote sensing imagery","volume":"72","author":"Yu","year":"2006","journal-title":"Photogramm. Eng. Remote Sens."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1016\/j.rse.2006.04.014","article-title":"Real-time monitoring and short-term forecasting of land surface phenology","volume":"104","author":"White","year":"2006","journal-title":"Remote Sens. Environ."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1016\/j.rse.2011.08.024","article-title":"A review of large area monitoring of land cover change using Landsat data","volume":"122","author":"Hansen","year":"2012","journal-title":"Remote Sens. Environ."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2207","DOI":"10.1109\/TGRS.2006.872081","article-title":"On the blending of the Landsat and MODIS surface reflectance: Predicting daily Landsat surface reflectance","volume":"44","author":"Gao","year":"2006","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1988","DOI":"10.1016\/j.rse.2009.05.011","article-title":"Generation of dense time series synthetic Landsat data through data blending with MODIS using a spatial and temporal adaptive reflectance fusion model","volume":"113","author":"Hilker","year":"2009","journal-title":"Remote Sens. Environ."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1212","DOI":"10.1109\/36.763276","article-title":"Unmixing-based multisensor multiresolution image fusion","volume":"37","author":"Zhukov","year":"1999","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"063507","DOI":"10.1117\/1.JRS.6.063507","article-title":"Use of MODIS and Landsat time series data to generate high-resolution temporal synthetic Landsat data using a spatial and temporal reflectance fusion model","volume":"6","author":"Wu","year":"2012","journal-title":"J. Appl. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1016\/j.rse.2015.11.016","article-title":"A flexible spatiotemporal method for fusing satellite images with different resolutions","volume":"172","author":"Zhu","year":"2016","journal-title":"Remote Sens. Environ."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2610","DOI":"10.1016\/j.rse.2010.05.032","article-title":"An enhanced spatial and temporal adaptive reflectance fusion model for complex heterogeneous regions","volume":"114","author":"Zhu","year":"2010","journal-title":"Remote Sens. Environ."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1613","DOI":"10.1016\/j.rse.2009.03.007","article-title":"A new data fusion model for high spatial-and temporal-resolution mapping of forest disturbance based on Landsat and MODIS","volume":"113","author":"Hilker","year":"2009","journal-title":"Remote Sens. Environ."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"3707","DOI":"10.1109\/TGRS.2012.2186638","article-title":"Spatiotemporal reflectance fusion via sparse representation","volume":"50","author":"Huang","year":"2012","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Belgiu, M., and Stein, A. (2019). Spatiotemporal image fusion in remote sensing. Remote Sens., 11.","DOI":"10.3390\/rs11070818"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"7126","DOI":"10.1109\/TGRS.2017.2742529","article-title":"Spatiotemporal fusion of MODIS and Landsat-7 reflectance images via compressed sensing","volume":"55","author":"Wei","year":"2017","journal-title":"Geosci. Remote Sens."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2039","DOI":"10.1109\/LGRS.2016.2622726","article-title":"Fast and accurate spatiotemporal fusion based upon extreme learning machine","volume":"13","author":"Liu","year":"2016","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1109\/JSTARS.2018.2797894","article-title":"Spatiotemporal satellite image fusion using deep convolutional neural networks","volume":"11","author":"Song","year":"2018","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"6552","DOI":"10.1109\/TGRS.2019.2907310","article-title":"StfNet: A two-stream convolutional neural network for spatiotemporal image fusion","volume":"57","author":"Liu","year":"2019","journal-title":"Geosci. Remote Sens."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TGRS.2020.3034752","article-title":"Spatiotemporal Remote Sensing Image Fusion Using Multiscale Two-Stream Convolutional Neural Networks","volume":"60","author":"Chen","year":"2021","journal-title":"Geosci. Remote Sens."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Jia, D., Song, C., Cheng, C., Shen, S., Ning, L., and Hui, C. (2020). A novel deep learning-based spatiotemporal fusion method for combining satellite images with different resolutions using a two-stream convolutional neural network. Remote Sens., 12.","DOI":"10.3390\/rs12040698"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Jia, D., Cheng, C., Song, C., Shen, S., Ning, L., and Zhang, T. (2021). A Hybrid Deep Learning-Based Spatiotemporal Fusion Method for Combining Satellite Images with Different Resolutions. Remote Sens., 13.","DOI":"10.3390\/rs13040645"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Tan, Z., Yue, P., Di, L., and Tang, J. (2018). Deriving high spatiotemporal remote sensing images using deep convolutional network. Remote Sens., 10.","DOI":"10.3390\/rs10071066"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Tan, Z., Di, L., Zhang, M., Guo, L., and Gao, M. (2019). An enhanced deep convolutional model for spatiotemporal image fusion. Remote Sens., 11.","DOI":"10.3390\/rs11242898"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"5851","DOI":"10.1109\/TGRS.2020.3023432","article-title":"CycleGAN-STF: Spatiotemporal fusion via CycleGAN-based image generation","volume":"59","author":"Chen","year":"2020","journal-title":"Geosci. Remote Sens."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1808","DOI":"10.1109\/TGRS.2020.2999943","article-title":"Spatiotemporal fusion of land surface temperature based on a convolutional neural network","volume":"59","author":"Yin","year":"2020","journal-title":"Geosci. Remote Sens."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Ao, Z., Sun, Y., Pan, X., and Xin, Q. (2022). Deep learning-based spatiotemporal data fusion using a patch-to-pixel mapping strategy and model comparisons. Geosci. Remote Sens.","DOI":"10.1109\/TGRS.2022.3154406"},{"key":"ref_28","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_29","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2021, January 3\u20137). An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. Proceedings of the ICLR 2021, Virtual Conference (Formerly Vienna), Vienna, Austria."},{"key":"ref_30","unstructured":"Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and J\u00e9gou, H. (2021, January 18\u201324). Training data-efficient image transformers & distillation through attention. Proceedings of the International Conference on Machine Learning, Virtual."},{"key":"ref_31","unstructured":"Chu, X., Zhang, B., Tian, Z., Wei, X., and Xia, H. (2021). Do we really need explicit position encodings for vision transformers. arXiv."},{"key":"ref_32","first-page":"15908","article-title":"Transformer in transformer","volume":"34","author":"Han","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., and Zhang, L. (2021, January 11\u201317). Cvt: Introducing convolutions to vision transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00009"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Yang, F., Yang, H., Fu, J., Lu, H., and Guo, B. (2021, January 20\u201325). Learning texture transformer network for image super-resolution. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR42600.2020.00583"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Li, W., Cao, D., Peng, Y., and Yang, C. (2021). MSNet: A multi-stream fusion network for remote sensing spatiotemporal fusion based on transformer and convolution. Remote Sens., 13.","DOI":"10.3390\/rs13183724"},{"key":"ref_37","first-page":"1","article-title":"SwinSTFM: Remote Sensing Spatiotemporal Fusion Using Swin Transformer","volume":"60","author":"Chen","year":"2022","journal-title":"Geosci. Remote Sens."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"4653","DOI":"10.1109\/JSTARS.2022.3179415","article-title":"MSFusion: Multistage for Remote Sensing Image Spatiotemporal Fusion Based on Texture Transformer and Convolutional Neural Network","volume":"15","author":"Yang","year":"2022","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_41","unstructured":"Yu, F., and Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv."},{"key":"ref_42","unstructured":"Glorot, X., Bordes, A., and Bengio, Y. (2011, January 11\u201313). Deep sparse rectifier neural networks. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Huber, P.J. (1992). Robust estimation of a location parameter. Breakthroughs in Statistics, Springer.","DOI":"10.1007\/978-1-4612-4380-9_35"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1016\/j.rse.2013.02.007","article-title":"Assessing the accuracy of blending Landsat\u2013MODIS surface reflectances in two landscapes with contrasting spatial and temporal dynamics: A framework for algorithm selection","volume":"133","author":"Emelyanova","year":"2013","journal-title":"Remote Sens. Environ."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"140302","DOI":"10.1007\/s11432-019-2805-y","article-title":"A new sensor bias-driven spatio-temporal fusion model based on convolutional neural networks","volume":"63","author":"Li","year":"2020","journal-title":"Sci. China Inf. Sci."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"140301","DOI":"10.1007\/s11432-019-2785-y","article-title":"Spatio-temporal fusion for remote sensing data: An overview and new benchmark","volume":"63","author":"Li","year":"2020","journal-title":"Sci. China Inf. Sci."},{"key":"ref_47","unstructured":"Yuhas, R.H., Goetz, A.F., and Boardman, J.W. (1992, January 1\u20135). Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm. Proceedings of the Summaries 3rd Annual JPL Airborne Geoscience Workshop, Pasadena, CA, USA."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"3880","DOI":"10.1109\/TGRS.2009.2029094","article-title":"Pansharpening quality assessment using the modulation transfer functions of instruments","volume":"47","author":"Khan","year":"2009","journal-title":"Geosci. Remote Sens."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1109\/TIP.2003.819861","article-title":"Image quality assessment: From error visibility to structural similarity","volume":"13","author":"Wang","year":"2004","journal-title":"IEEE Trans. Image Process."},{"key":"ref_50","unstructured":"Ponomarenko, N., Ieremeiev, O., Lukin, V., Egiazarian, K., and Carli, M. (2011, January 23\u201325). Modified image visual quality metrics for contrast change and mean shift accounting. Proceedings of the 2011 11th International Conference the Experience of Designing and Application of CAD Systems in Microelectronics (CADSM), Polyana, Ukraine."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/18\/4544\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:29:33Z","timestamp":1760142573000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/18\/4544"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,11]]},"references-count":50,"journal-issue":{"issue":"18","published-online":{"date-parts":[[2022,9]]}},"alternative-id":["rs14184544"],"URL":"https:\/\/doi.org\/10.3390\/rs14184544","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,11]]}}}