{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T02:06:40Z","timestamp":1774663600588,"version":"3.50.1"},"reference-count":37,"publisher":"MDPI AG","issue":"18","license":[{"start":{"date-parts":[[2021,9,17]],"date-time":"2021-09-17T00:00:00Z","timestamp":1631836800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Remote sensing products with high temporal and spatial resolution can be hardly obtained under the constrains of existing technology and cost. Therefore, the spatiotemporal fusion of remote sensing images has attracted considerable attention. Spatiotemporal fusion algorithms based on deep learning have gradually developed, but they also face some problems. For example, the amount of data affects the model\u2019s ability to learn, and the robustness of the model is not high. The features extracted through the convolution operation alone are insufficient, and the complex fusion method also introduces noise. To solve these problems, we propose a multi-stream fusion network for remote sensing spatiotemporal fusion based on Transformer and convolution, called MSNet. We introduce the structure of the Transformer, which aims to learn the global temporal correlation of the image. At the same time, we also use a convolutional neural network to establish the relationship between input and output and to extract features. Finally, we adopt the fusion method of average weighting to avoid using complicated methods to introduce noise. To test the robustness of MSNet, we conducted experiments on three datasets and compared them with four representative spatiotemporal fusion algorithms to prove the superiority of MSNet (Spectral Angle Mapper (SAM) &lt; 0.193 on the CIA dataset, erreur relative global adimensionnelle de synthese (ERGAS) &lt; 1.687 on the LGC dataset, and root mean square error (RMSE) &lt; 0.001 on the AHB dataset).<\/jats:p>","DOI":"10.3390\/rs13183724","type":"journal-article","created":{"date-parts":[[2021,9,22]],"date-time":"2021-09-22T03:47:35Z","timestamp":1632282455000},"page":"3724","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":50,"title":["MSNet: A Multi-Stream Fusion Network for Remote Sensing Spatiotemporal Fusion Based on Transformer and Convolution"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9033-8245","authenticated-orcid":false,"given":"Weisheng","family":"Li","sequence":"first","affiliation":[{"name":"College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China"}]},{"given":"Dongwen","family":"Cao","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China"}]},{"given":"Yidong","family":"Peng","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China"}]},{"given":"Chao","family":"Yang","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,9,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1228","DOI":"10.1109\/36.701075","article-title":"The Moderate Resolution Imaging Spectroradiometer (MODIS): Land remote sensing for global change research","volume":"36","author":"Justice","year":"1998","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1016\/j.rse.2014.09.015","article-title":"Using satellite remote sensing data to estimate the high-resolution distribution of ground-level PM2.5","volume":"156","author":"Lin","year":"2015","journal-title":"Remote Sens. Environ."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1109\/TCYB.2016.2605044","article-title":"Simultaneous spectral-spatial feature selection and extraction for hyperspectral images","volume":"48","author":"Zhang","year":"2016","journal-title":"IEEE Trans. Cybern."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"799","DOI":"10.14358\/PERS.72.7.799","article-title":"Object-based detailed vegetation classification with airborne high spatial resolution remote sensing imagery","volume":"72","author":"Yu","year":"2006","journal-title":"Photogramm. Eng. Remote Sens."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1016\/j.rse.2006.04.014","article-title":"Real-time monitoring and short-term forecasting of land surface phenology","volume":"104","author":"White","year":"2006","journal-title":"Remote Sens. Environ."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1016\/j.rse.2011.08.024","article-title":"A review of large area monitoring of land cover change using Landsat data","volume":"122","author":"Hansen","year":"2012","journal-title":"Remote Sens. Environ."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2207","DOI":"10.1109\/TGRS.2006.872081","article-title":"On the blending of the Landsat and MODIS surface reflectance: Predicting daily Landsat surface reflectance","volume":"44","author":"Gao","year":"2006","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1988","DOI":"10.1016\/j.rse.2009.05.011","article-title":"Generation of dense time series synthetic Landsat data through data blending with MODIS using a spatial and temporal adaptive reflectance fusion model","volume":"113","author":"Hilker","year":"2009","journal-title":"Remote Sens. Environ."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"2610","DOI":"10.1016\/j.rse.2010.05.032","article-title":"An enhanced spatial and temporal adaptive reflectance fusion model for complex heterogeneous regions","volume":"114","author":"Zhu","year":"2010","journal-title":"Remote Sens. Environ."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1613","DOI":"10.1016\/j.rse.2009.03.007","article-title":"A new data fusion model for high spatial-and temporal-resolution mapping of forest disturbance based on Landsat and MODIS","volume":"113","author":"Hilker","year":"2009","journal-title":"Remote Sens. Environ."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1212","DOI":"10.1109\/36.763276","article-title":"Unmixing-based multisensor multiresolution image fusion","volume":"37","author":"Zhukov","year":"1999","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"063507","DOI":"10.1117\/1.JRS.6.063507","article-title":"Use of MODIS and Landsat time series data to generate high-resolution temporal synthetic Landsat data using a spatial and temporal reflectance fusion model","volume":"6","author":"Wu","year":"2012","journal-title":"J. Appl. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1016\/j.rse.2015.11.016","article-title":"A flexible spatiotemporal method for fusing satellite images with different resolutions","volume":"172","author":"Zhu","year":"2016","journal-title":"Remote Sens. Environ."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"3707","DOI":"10.1109\/TGRS.2012.2186638","article-title":"Spatiotemporal reflectance fusion via sparse representation","volume":"50","author":"Huang","year":"2012","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Belgiu, M., and Stein, A. (2019). Spatiotemporal image fusion in remote sensing. Remote Sens., 11.","DOI":"10.3390\/rs11070818"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"7126","DOI":"10.1109\/TGRS.2017.2742529","article-title":"Spatiotemporal fusion of MODIS and Landsat-7 reflectance images via compressed sensing","volume":"55","author":"Wei","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2039","DOI":"10.1109\/LGRS.2016.2622726","article-title":"Fast and accurate spatiotemporal fusion based upon extreme learning machine","volume":"13","author":"Liu","year":"2016","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1109\/JSTARS.2018.2797894","article-title":"Spatiotemporal satellite image fusion using deep convolutional neural networks","volume":"11","author":"Song","year":"2018","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"6552","DOI":"10.1109\/TGRS.2019.2907310","article-title":"StfNet: A two-stream convolutional neural network for spatiotemporal image fusion","volume":"57","author":"Liu","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Tan, Z., Yue, P., Di, L., and Tang, J. (2018). Deriving high spatiotemporal remote sensing images using deep convolutional network. Remote Sens., 10.","DOI":"10.3390\/rs10071066"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Tan, Z., Di, L., Zhang, M., Guo, L., and Gao, M. (2019). An enhanced deep convolutional model for spatiotemporal image fusion. Remote Sens., 11.","DOI":"10.3390\/rs11242898"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"5851","DOI":"10.1109\/TGRS.2020.3023432","article-title":"CycleGAN-STF: Spatiotemporal fusion via CycleGAN-based image generation","volume":"59","author":"Chen","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1808","DOI":"10.1109\/TGRS.2020.2999943","article-title":"Spatiotemporal fusion of land surface temperature based on a convolutional neural network","volume":"59","author":"Yin","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_24","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_25","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2021, January 3\u20137). An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. Proceedings of the ICLR 2021, Virtual Conference, Formerly, Vienna, Austria."},{"key":"ref_26","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_28","unstructured":"Simonyan, K., and Zisserman, A. (2015, January 7\u20139). Very deep convolutional networks for large-scale image recognition. Proceedings of the ICLR 2015, San Diego, CA, USA."},{"key":"ref_29","unstructured":"Glorot, X., Bordes, A., and Bengio, Y. (2011, January 11\u201313). Deep sparse rectifier neural networks. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Huber, P.J. (1992). Robust estimation of a location parameter. Breakthroughs in Statistics, Springer.","DOI":"10.1007\/978-1-4612-4380-9_35"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1016\/j.rse.2013.02.007","article-title":"Assessing the accuracy of blending Landsat\u2013MODIS surface reflectances in two landscapes with contrasting spatial and temporal dynamics: A framework for algorithm selection","volume":"133","author":"Emelyanova","year":"2013","journal-title":"Remote Sens. Environ."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"140302","DOI":"10.1007\/s11432-019-2805-y","article-title":"A new sensor bias-driven spatio-temporal fusion model based on convolutional neural networks","volume":"63","author":"Li","year":"2020","journal-title":"Sci. China Inf. Sci."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"140301","DOI":"10.1007\/s11432-019-2785-y","article-title":"Spatio-temporal fusion for remote sensing data: An overview and new benchmark","volume":"63","author":"Li","year":"2020","journal-title":"Sci. China Inf. Sci."},{"key":"ref_34","unstructured":"Yuhas, R.H., Goetz, A.F., and Boardman, J.W. (1992, January 1\u20135). Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm. Proceedings of the Summaries 3rd Annual JPL Airborne Earth Science Workshop, Pasadena, CA, USA."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"3880","DOI":"10.1109\/TGRS.2009.2029094","article-title":"Pansharpening quality assessment using the modulation transfer functions of instruments","volume":"47","author":"Khan","year":"2009","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1109\/TIP.2003.819861","article-title":"Image quality assessment: From error visibility to structural similarity","volume":"13","author":"Wang","year":"2004","journal-title":"IEEE Trans. Image Process."},{"key":"ref_37","unstructured":"Ponomarenko, N., Ieremeiev, O., Lukin, V., Egiazarian, K., and Carli, M. (2011, January 23\u201325). Modified image visual quality metrics for contrast change and mean shift accounting. Proceedings of the 2011 11th International Conference the Experience of Designing and Application of CAD Systems in Microelectronics (CADSM), Polyana-Svalyava, Ukraine."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/18\/3724\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:01:16Z","timestamp":1760166076000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/18\/3724"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,17]]},"references-count":37,"journal-issue":{"issue":"18","published-online":{"date-parts":[[2021,9]]}},"alternative-id":["rs13183724"],"URL":"https:\/\/doi.org\/10.3390\/rs13183724","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,17]]}}}