{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,16]],"date-time":"2025-07-16T13:55:23Z","timestamp":1752674123097,"version":"3.37.3"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2022,12,15]],"date-time":"2022-12-15T00:00:00Z","timestamp":1671062400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,12,15]],"date-time":"2022-12-15T00:00:00Z","timestamp":1671062400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62201458","61901384"],"award-info":[{"award-number":["62201458","61901384"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61231016"],"award-info":[{"award-number":["61231016"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100007128","name":"Natural Science Foundation of Shaanxi Province","doi-asserted-by":"publisher","award":["2022JQ-577"],"award-info":[{"award-number":["2022JQ-577"]}],"id":[{"id":"10.13039\/501100007128","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing"},{"name":"Special Construction Fund for Key Disciplines of Shaanxi Provincial Higher Education"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Video super-resolution (VSR) aims to recover the high-resolution (HR) contents from the low-resolution (LR) observations relying on compositing the spatial\u2013temporal information in the LR frames. It is crucial to propagate and aggregate spatial\u2013temporal information. Recently, while transformers show impressive performance on high-level vision tasks, few attempts have been made on image restoration, especially on VSR. In addition, previous transformers simultaneously process spatial\u2013temporal information, easily synthesizing confused textures and high computational cost limit its development. Towards this end, we construct a novel bidirectional recurrent VSR architecture. Our model disentangles the task of learning spatial\u2013temporal information into two easier sub-tasks, each sub-task focuses on propagating and aggregating specific information with a multi-scale transformer-based design, which alleviates the difficulty of learning. Additionally, an attention-guided motion compensation module is applied to get rid of the influence of misalignment between frames. Experiments on three widely used benchmark datasets show that, relying on superior feature correlation learning, the proposed network can outperform previous state-of-the-art methods, especially for recovering the fine details.<\/jats:p>","DOI":"10.1007\/s40747-022-00944-x","type":"journal-article","created":{"date-parts":[[2022,12,15]],"date-time":"2022-12-15T07:02:51Z","timestamp":1671087771000},"page":"3989-4002","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Attention-guided video super-resolution with recurrent multi-scale spatial\u2013temporal transformer"],"prefix":"10.1007","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6705-1326","authenticated-orcid":false,"given":"Wei","family":"Sun","sequence":"first","affiliation":[]},{"given":"Xianguang","family":"Kong","sequence":"additional","affiliation":[]},{"given":"Yanning","family":"Zhang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,12,15]]},"reference":[{"key":"944_CR1","first-page":"1","volume":"34","author":"MF Che Aminudin","year":"2021","unstructured":"Che Aminudin MF, Suandi SA (2021) Video surveillance image enhancement via a convolutional neural network and stacked denoising autoencoder. Neural Comput Appl 34:1\u201317","journal-title":"Neural Comput Appl"},{"doi-asserted-by":"crossref","unstructured":"Kim SY, Oh J, Kim M (2019) Deep SR-ITM: joint learning of super-resolution and inverse tone-mapping for 4k UHD HDR applications. In: IEEE International Conference on Computer Vision, pp 3116\u20133125","key":"944_CR2","DOI":"10.1109\/ICCV.2019.00321"},{"key":"944_CR3","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1016\/j.neucom.2019.05.047","volume":"358","author":"W Sun","year":"2019","unstructured":"Sun W, Sun J, Zhu Y, Hu Y, Ding C, Li H, Zhang Y (2019) Complementary coded aperture set for compressive high-resolution imaging. Neurocomputing 358:177\u2013187","journal-title":"Neurocomputing"},{"key":"944_CR4","doi-asserted-by":"publisher","first-page":"2947","DOI":"10.1109\/TIP.2021.3049951","volume":"30","author":"W Sun","year":"2021","unstructured":"Sun W, Gong D, Shi Q, van den Hengel A, Zhang Y (2021) Learning to zoom-in via learning to zoom-out: real-world super-resolution by generating and adapting degradation. IEEE Trans Image Process 30:2947\u20132962","journal-title":"IEEE Trans Image Process"},{"issue":"4","key":"944_CR5","doi-asserted-by":"publisher","first-page":"3089","DOI":"10.1007\/s40747-021-00465-z","volume":"8","author":"B Goyal","year":"2022","unstructured":"Goyal B, Lepcha DC, Dogra A, Wang S-H (2022) A weighted least squares optimization strategy for medical image super resolution via multiscale convolutional neural networks for healthcare applications. Complex Intell Syst 8(4):3089\u20133104","journal-title":"Complex Intell Syst"},{"issue":"3","key":"944_CR6","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1109\/MSP.2003.1203207","volume":"20","author":"SC Park","year":"2003","unstructured":"Park SC, Park MK, Kang MG (2003) Super-resolution image reconstruction: a technical overview. IEEE Signal Process Mag 20(3):21\u201336","journal-title":"IEEE Signal Process Mag"},{"doi-asserted-by":"crossref","unstructured":"Yi P, Wang Z, Jiang K, Jiang J, Ma J (2019) Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations. In: IEEE International Conference on Computer Vision, pp 3106\u20133115","key":"944_CR7","DOI":"10.1109\/ICCV.2019.00320"},{"key":"944_CR8","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1016\/j.neucom.2020.03.068","volume":"406","author":"W Sun","year":"2020","unstructured":"Sun W, Zhang Y (2020) Attention-guided dual spatial-temporal non-local network for video super-resolution. Neurocomputing 406:24\u201333","journal-title":"Neurocomputing"},{"key":"944_CR9","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1016\/j.patcog.2019.107139","volume":"100","author":"Q Lai","year":"2020","unstructured":"Lai Q, Nie Y, Sun H, Xu Q, Zhang Z, Xiao M (2020) Video super-resolution via pre-frame constrained and deep-feature enhanced sparse reconstruction. Pattern Recogn 100:107\u2013139","journal-title":"Pattern Recogn"},{"key":"944_CR10","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2022.108577","volume":"126","author":"W Sun","year":"2022","unstructured":"Sun W, Gong D, Shi JQ, van den Hengel A, Zhang Y (2022) Video super-resolution via mixed spatial-temporal convolution and selective fusion. Pattern Recogn 126:108577","journal-title":"Pattern Recogn"},{"doi-asserted-by":"crossref","unstructured":"Fuoli D, Gu S, Timofte R (2019) Efficient video super-resolution through recurrent latent space propagation. In: International Conference on Computer Vision Workshops, pp 3476\u20133485","key":"944_CR11","DOI":"10.1109\/ICCVW.2019.00431"},{"doi-asserted-by":"crossref","unstructured":"Chan KCK, Wang X, Yu K, Dong C, Loy CC (2021) Basicvsr: the search for essential components in video super-resolution and beyond. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 4947\u20134956","key":"944_CR12","DOI":"10.1109\/CVPR46437.2021.00491"},{"unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol\u00a030","key":"944_CR13"},{"doi-asserted-by":"crossref","unstructured":"Vaswani A, Ramachandran P, Srinivas A, Parmar N, Hechtman B, Shlens J (2021) Scaling local self-attention for parameter efficient visual backbones. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 12894\u201312904","key":"944_CR14","DOI":"10.1109\/CVPR46437.2021.01270"},{"issue":"2","key":"944_CR15","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1007\/s11263-019-01247-4","volume":"128","author":"L Liu","year":"2020","unstructured":"Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietik\u00e4inen M (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261\u2013318","journal-title":"Int J Comput Vis"},{"doi-asserted-by":"crossref","unstructured":"Liu C, Yang H, Fu J, Qian X (2022) Learning trajectory-aware transformer for video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5687\u20135696","key":"944_CR16","DOI":"10.1109\/CVPR52688.2022.00560"},{"unstructured":"Cao J, Li Y, Zhang K, Van\u00a0Gool L (2021) Video super-resolution transformer, arXiv preprint arXiv:2106.06847","key":"944_CR17"},{"key":"944_CR18","doi-asserted-by":"publisher","first-page":"8583","DOI":"10.1002\/int.22957","volume":"37","author":"H Xing","year":"2022","unstructured":"Xing H, Xiao Z, Zhan D, Luo S, Dai P, Li K (2022) Selfmatch: robust semisupervised time-series classification with self-distillation. Int J Intell Syst 37:8583\u20138610","journal-title":"Int J Intell Syst"},{"doi-asserted-by":"crossref","unstructured":"Wu S, Song X, Feng Z (2021) MECT: multi-metadata embedding based cross-transformer for Chinese named entity recognition. Association for Computational Linguistics, pp 1529\u20131539","key":"944_CR19","DOI":"10.18653\/v1\/2021.acl-long.121"},{"doi-asserted-by":"crossref","unstructured":"Wang X, Chan KCK, Yu K, Dong C, Loy CC (2019) EDVR: video restoration with enhanced deformable convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, p 0\u20138","key":"944_CR20","DOI":"10.1109\/CVPRW.2019.00247"},{"doi-asserted-by":"crossref","unstructured":"Tao X, Gao H, Liao R, Wang J, Jia J (2017) Detail-revealing deep video super-resolution. In: IEEE International Conference on Computer Vision, pp 4482\u20134490","key":"944_CR21","DOI":"10.1109\/ICCV.2017.479"},{"issue":"8","key":"944_CR22","doi-asserted-by":"publisher","first-page":"1106","DOI":"10.1007\/s11263-018-01144-2","volume":"127","author":"T Xue","year":"2019","unstructured":"Xue T, Chen B, Wu J, Wei D, Freeman WT (2019) Video enhancement with task-oriented flow. Int J Comput Vis 127(8):1106\u20131125","journal-title":"Int J Comput Vis"},{"doi-asserted-by":"crossref","unstructured":"Jo Y, Oh SW, Kang J, Kim SJ (2018) Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3224\u20133232","key":"944_CR23","DOI":"10.1109\/CVPR.2018.00340"},{"key":"944_CR24","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.neucom.2020.04.039","volume":"403","author":"W Sun","year":"2020","unstructured":"Sun W, Sun J, Zhu Y, Zhang Y (2020) Video super-resolution via dense non-local spatial-temporal convolutional network. Neurocomputing 403:1\u201312","journal-title":"Neurocomputing"},{"doi-asserted-by":"crossref","unstructured":"Sajjadi MSM, Vemulapalli R, Brown M (2018) Frame-recurrent video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6626\u20136634","key":"944_CR25","DOI":"10.1109\/CVPR.2018.00693"},{"doi-asserted-by":"crossref","unstructured":"Isobe T, Li S, Jia X, Yuan S, Slabaugh G, Xu C, Li Y-L, Wang S, Tian Q (2020) Video super-resolution with temporal group attention. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 8008\u20138017","key":"944_CR26","DOI":"10.1109\/CVPR42600.2020.00803"},{"doi-asserted-by":"crossref","unstructured":"Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6881\u20136890","key":"944_CR27","DOI":"10.1109\/CVPR46437.2021.00681"},{"doi-asserted-by":"crossref","unstructured":"Chen H, Wang Y, Guo T, Xu C, Deng Y, Liu Z, Ma S, Xu C, Xu C, Gao W (2021) Pre-trained image processing transformer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 12299\u201312310","key":"944_CR28","DOI":"10.1109\/CVPR46437.2021.01212"},{"doi-asserted-by":"crossref","unstructured":"Wang Z, Cun X, Bao J, Liu J (2021) Uformer: a general u-shaped transformer for image restoration. arXiv preprint arXiv:2106.03106","key":"944_CR29","DOI":"10.1109\/CVPR52688.2022.01716"},{"doi-asserted-by":"crossref","unstructured":"Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030","key":"944_CR30","DOI":"10.1109\/ICCV48922.2021.00986"},{"doi-asserted-by":"crossref","unstructured":"Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R (2021) Swinir: image restoration using swin transformer. In: IEEE International Conference on Computer Vision, pp 1833\u20131844","key":"944_CR31","DOI":"10.1109\/ICCVW54120.2021.00210"},{"doi-asserted-by":"crossref","unstructured":"Shi W, Caballero J, Huszar F, Totz J, Aitken AP, Bishop R, Rueckert D, Wang Z (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1874\u20131883","key":"944_CR32","DOI":"10.1109\/CVPR.2016.207"},{"doi-asserted-by":"crossref","unstructured":"Tian Y, Zhang Y, Fu Y, Xu C (2020) TDAN: temporally-deformable alignment network for video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3357\u20133366","key":"944_CR33","DOI":"10.1109\/CVPR42600.2020.00342"},{"unstructured":"Chan KC, Wang X, Yu K, Dong C, Loy CC (2020) Understanding deformable alignment in video super-resolution. arXiv preprint arXiv:2009.07265","key":"944_CR34"},{"unstructured":"Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et\u00a0al (2010) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929","key":"944_CR35"},{"unstructured":"Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450","key":"944_CR36"},{"issue":"8","key":"944_CR37","doi-asserted-by":"publisher","first-page":"1106","DOI":"10.1007\/s11263-018-01144-2","volume":"127","author":"T Xue","year":"2019","unstructured":"Xue T, Chen B, Wu J, Wei D, Freeman WT (2019) Video enhancement with task-oriented flow. Int J Comput Vis 127(8):1106\u20131125","journal-title":"Int J Comput Vis"},{"issue":"2","key":"944_CR38","doi-asserted-by":"publisher","first-page":"346","DOI":"10.1109\/TPAMI.2013.127","volume":"36","author":"C Liu","year":"2014","unstructured":"Liu C, Sun D (2014) On Bayesian adaptive video super resolution. IEEE Trans Pattern Anal Mach Intell 36(2):346\u2013360","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"doi-asserted-by":"crossref","unstructured":"Haris M, Shakhnarovich G, Ukita N (2019) Recurrent back-projection network for video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3897\u20133906","key":"944_CR39","DOI":"10.1109\/CVPR.2019.00402"},{"unstructured":"Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: International Conference on Learning. Representations","key":"944_CR40"},{"doi-asserted-by":"crossref","unstructured":"Charbonnier P, Blanc-Feraud L, Aubert G, Barlaud M (1994) Two deterministic half-quadratic regularization algorithms for computed imaging. In: International Conference on Image Processing, vol\u00a02, pp 168\u2013172","key":"944_CR41","DOI":"10.1109\/ICIP.1994.413553"},{"doi-asserted-by":"crossref","unstructured":"Lai W, Huang J, Ahuja N, Yang M (2017) Deep Laplacian pyramid networks for fast and accurate super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5835\u20135843","key":"944_CR42","DOI":"10.1109\/CVPR.2017.618"},{"doi-asserted-by":"crossref","unstructured":"Haris M, Shakhnarovich G, Ukita N (2018) Deep back-projection networks for super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1664\u20131673","key":"944_CR43","DOI":"10.1109\/CVPR.2018.00179"},{"doi-asserted-by":"crossref","unstructured":"Yi P, Wang Z, Jiang K, Jiang J, Lu T, Tian X, Ma J (2021) Omniscient video super-resolution. In: IEEE International Conference on Computer Vision, pp 4429\u20134438","key":"944_CR44","DOI":"10.1109\/ICCV48922.2021.00439"},{"doi-asserted-by":"crossref","unstructured":"Geng Z, Liang L, Ding T, Zharkov I (2022) Rstt: real-time spatial temporal transformer for space-time video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 17441\u201317451","key":"944_CR45","DOI":"10.1109\/CVPR52688.2022.01692"},{"doi-asserted-by":"crossref","unstructured":"Chan KC, Zhou S, Xu X, Loy CC (2022) Basicvsr++: Improving video super-resolution with enhanced propagation and alignment. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5972\u20135981","key":"944_CR46","DOI":"10.1109\/CVPR52688.2022.00588"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00944-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-022-00944-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00944-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,27]],"date-time":"2023-07-27T13:17:08Z","timestamp":1690463828000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-022-00944-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,15]]},"references-count":46,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,8]]}},"alternative-id":["944"],"URL":"https:\/\/doi.org\/10.1007\/s40747-022-00944-x","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"type":"print","value":"2199-4536"},{"type":"electronic","value":"2198-6053"}],"subject":[],"published":{"date-parts":[[2022,12,15]]},"assertion":[{"value":"24 May 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 November 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 December 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}