{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,2]],"date-time":"2026-02-02T19:58:04Z","timestamp":1770062284512,"version":"3.49.0"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,6,16]],"date-time":"2023-06-16T00:00:00Z","timestamp":1686873600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,6,16]],"date-time":"2023-06-16T00:00:00Z","timestamp":1686873600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Auton. Intell. Syst."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Space-time video super-resolution (STVSR) serves the purpose to reconstruct high-resolution high-frame-rate videos from their low-resolution low-frame-rate counterparts. Recent approaches utilize end-to-end deep learning models to achieve STVSR. They first interpolate intermediate frame features between given frames, then perform local and global refinement among the feature sequence, and finally increase the spatial resolutions of these features. However, in the most important feature interpolation phase, they only capture spatial-temporal information from the most adjacent frame features, ignoring modelling long-term spatial-temporal correlations between multiple neighbouring frames to restore variable-speed object movements and maintain long-term motion continuity. In this paper, we propose a novel long-term temporal feature aggregation network (LTFA-Net) for STVSR. Specifically, we design a long-term mixture of experts (LTMoE) module for feature interpolation. LTMoE contains multiple experts to extract mutual and complementary spatial-temporal information from multiple consecutive adjacent frame features, which are then combined with different weights to obtain interpolation results using several gating nets. Next, we perform local and global feature refinement using the Locally-temporal Feature Comparison (LFC) module and bidirectional deformable ConvLSTM layer, respectively. Experimental results on two standard benchmarks, Adobe240 and GoPro, indicate the effectiveness and superiority of our approach over state of the art.<\/jats:p>","DOI":"10.1007\/s43684-023-00051-9","type":"journal-article","created":{"date-parts":[[2023,6,16]],"date-time":"2023-06-16T10:01:56Z","timestamp":1686909716000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Space-time video super-resolution using long-term temporal feature aggregation"],"prefix":"10.1007","volume":"3","author":[{"given":"Kuanhao","family":"Chen","sequence":"first","affiliation":[]},{"given":"Zijie","family":"Yue","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4933-0073","authenticated-orcid":false,"given":"Miaojing","family":"Shi","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,6,16]]},"reference":[{"key":"51_CR1","unstructured":"Z. Yue, M. Shi, S. Ding, S. Yang, Enhancing space-time video super-resolution via spatial-temporal feature interaction (2022). http:\/\/arxiv.org\/abs\/2207.08960"},{"key":"51_CR2","volume-title":"AAAI","author":"S.Y. Kim","year":"2022","unstructured":"S.Y. Kim, J. Oh, M. Kim, FISR: deep joint frame interpolation and super-resolution with a multi-scale temporal loss, in AAAI (2022). http:\/\/arxiv.org\/abs\/1912.07213"},{"key":"51_CR3","volume-title":"ECCV","author":"E. Shechtman","year":"2002","unstructured":"E. Shechtman, Y. Caspi, M. Irani, Increasing space-time resolution in video, in ECCV (2002)"},{"key":"51_CR4","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1109\/TPAMI.2005.85","volume":"27","author":"E. Shechtman","year":"2005","unstructured":"E. Shechtman, Y. Caspi, M. Irani, Space-time super-resolution. IEEE Trans. Pattern Anal. Mach. Intell. 27, 531\u2013545 (2005). https:\/\/doi.org\/10.1109\/TPAMI.2005.85","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"51_CR5","doi-asserted-by":"publisher","first-page":"3353","DOI":"10.1109\/CVPR.2011.5995360","volume-title":"CVPR","author":"O. Shahar","year":"2011","unstructured":"O. Shahar, A. Faktor, M. Irani, Space-time super-resolution from a single video, in CVPR (2011), pp. 3353\u20133360. https:\/\/doi.org\/10.1109\/CVPR.2011.5995360"},{"key":"51_CR6","volume-title":"CVPR","author":"S. Niklaus","year":"2020","unstructured":"S. Niklaus, F. Liu, Softmax splatting for video frame interpolation, in CVPR (2020). http:\/\/arxiv.org\/abs\/2003.05534"},{"key":"51_CR7","doi-asserted-by":"publisher","first-page":"1106","DOI":"10.1007\/s11263-018-01144-2","volume-title":"IJCV","author":"T. Xue","year":"2019","unstructured":"T. Xue, B. Chen, J. Wu, D. Wei, W.T. Freeman, Video enhancement with task-oriented flow, in IJCV, vol.\u00a0127 (2019), pp. 1106\u20131125. https:\/\/doi.org\/10.1007\/s11263-018-01144-2"},{"key":"51_CR8","volume-title":"IJCV","author":"X. Xu","year":"2019","unstructured":"X. Xu, L. Siyao, W. Sun, Q. Yin, M.-H. Yang, Quadratic video interpolation, in IJCV (2019). http:\/\/arxiv.org\/abs\/1911.00627"},{"issue":"11","key":"51_CR9","doi-asserted-by":"publisher","first-page":"2599","DOI":"10.1109\/TPAMI.2018.2865304","volume":"41","author":"W.-S. Lai","year":"2018","unstructured":"W.-S. Lai, J.-B. Huang, N. Ahuja, M.-H. Yang, Fast and accurate image super-resolution with deep Laplacian pyramid networks. IEEE Trans. Pattern Anal. Mach. Intell. 41(11), 2599\u20132613 (2018). http:\/\/arxiv.org\/abs\/1710.01992","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"51_CR10","volume-title":"CVPR","author":"Y. Tian","year":"2018","unstructured":"Y. Tian, Y. Zhang, Y. Fu, C. Xu, TDAN: temporally deformable alignment network for video super-resolution, in CVPR (2018). http:\/\/arxiv.org\/abs\/1812.02898"},{"key":"51_CR11","volume-title":"CVPR Workshop","author":"X. Wang","year":"2019","unstructured":"X. Wang, K.C.K. Chan, K. Yu, C. Dong, C.C. Loy, EDVR video restoration with enhanced deformable convolutional networks, in CVPR Workshop (2019). http:\/\/arxiv.org\/abs\/1905.02716"},{"key":"51_CR12","volume-title":"CVPR","author":"M. Haris","year":"2020","unstructured":"M. Haris, G. Shakhnarovich, N. Ukita, Space-time-aware multi-resolution video enhancement, in CVPR (2020). http:\/\/arxiv.org\/abs\/2003.13170"},{"key":"51_CR13","volume-title":"CVPR","author":"X. Xiang","year":"2020","unstructured":"X. Xiang, Y. Tian, Y. Zhang, Y. Fu, J.P. Allebach, C. Xu, Zooming slow-mo: fast and accurate one-stage space-time video super-resolution, in CVPR (2020). http:\/\/arxiv.org\/abs\/2002.11616"},{"key":"51_CR14","volume-title":"CVPR","author":"G. Xu","year":"2021","unstructured":"G. Xu, J. Xu, Z. Li, L. Wang, X. Sun, M.-M. Cheng, Temporal modulation network for controllable space-time video super-resolution, in CVPR (2021). http:\/\/arxiv.org\/abs\/2104.10642"},{"key":"51_CR15","volume-title":"ICCV","author":"S. Niklaus","year":"2017","unstructured":"S. Niklaus, L. Mai, F. Liu, Video frame interpolation via adaptive separable convolution, in ICCV (2017). http:\/\/arxiv.org\/abs\/1708.01692"},{"key":"51_CR16","volume-title":"CVPR","author":"S. Niklaus","year":"2017","unstructured":"S. Niklaus, L. Mai, F. Liu, Video frame interpolation via adaptive convolution, in CVPR (2017). http:\/\/arxiv.org\/abs\/1703.07514"},{"key":"51_CR17","volume-title":"CVPR","author":"H. Lee","year":"2020","unstructured":"H. Lee, T. Kim, T. Chung, D. Pak, Y. Ban, S. Lee, AdaCoF: adaptive collaboration of flows for video frame interpolation, in CVPR (2020). http:\/\/arxiv.org\/abs\/1907.10244"},{"key":"51_CR18","volume-title":"CVPR","author":"W. Bao","year":"2019","unstructured":"W. Bao, W.-S. Lai, C. Ma, X. Zhang, Z. Gao, M.-H. Yang, Depth-aware video frame interpolation, in CVPR (2019). http:\/\/arxiv.org\/abs\/1904.00830"},{"key":"51_CR19","volume-title":"CVPR","author":"K.C.K. Chan","year":"2021","unstructured":"K.C.K. Chan, X. Wang, K. Yu, C. Dong, C.C. Loy, BasicVSR: the search for essential components in video super-resolution and beyond, in CVPR (2021). http:\/\/arxiv.org\/abs\/2012.02181"},{"key":"51_CR20","volume-title":"CVPR","author":"M. Haris","year":"2018","unstructured":"M. Haris, G. Shakhnarovich, N. Ukita, Deep back-projection networks for super-resolution, in CVPR (2018). http:\/\/arxiv.org\/abs\/1803.02735"},{"key":"51_CR21","volume-title":"CVPR","author":"W. Shi","year":"2016","unstructured":"W. Shi, J. Caballero, F. Husz\u00e1r, J. Totz, A.P. Aitken, R. Bishop, D. Rueckert, Z. Wang, Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network, in CVPR (2016). http:\/\/arxiv.org\/abs\/1609.05158"},{"key":"51_CR22","volume-title":"WACV","author":"C. You","year":"2022","unstructured":"C. You, L. Han, A. Feng, R. Zhao, H. Tang, W. Fan, Megan: memory enhanced graph attention network for space-time video super-resolution, in WACV (2022)"},{"key":"51_CR23","volume-title":"ECCV","author":"J. Cao","year":"2022","unstructured":"J. Cao, J. Liang, K. Zhang, W. Wang, Q. Wang, Y. Zhang, H. Tang, L.V. Gool, Towards interpretable video super-resolution via alternating optimization, in ECCV (2022)"},{"key":"51_CR24","volume-title":"CVPR","author":"M. Hu","year":"2022","unstructured":"M. Hu, K. Jiang, L. Liao, J. Xiao, J. Jiang, Z. Wang, Spatial-temporal space hand-in-hand: spatial-temporal video super-resolution via cycle-projected mutual learning, in CVPR (2022)"},{"key":"51_CR25","volume-title":"CVPR","author":"Z. Geng","year":"2022","unstructured":"Z. Geng, L. Liang, T. Ding, I. Zharkov, RSTT: real-time spatial temporal transformer for space-time video super-resolution, in CVPR (2022). http:\/\/arxiv.org\/abs\/2203.14186"},{"key":"51_CR26","doi-asserted-by":"crossref","unstructured":"H. Wang, X. Xiang, Y. Tian, W. Yang, Q. Liao, STDAN: deformable attention network for space-time video super-resolution (2022). http:\/\/arxiv.org\/abs\/2203.06841","DOI":"10.1109\/TNNLS.2023.3243029"},{"key":"51_CR27","volume-title":"CVPR","author":"Z. Liu","year":"2021","unstructured":"Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: hierarchical vision transformer using shifted windows, in CVPR (2021). http:\/\/arxiv.org\/abs\/2103.14030"},{"key":"51_CR28","doi-asserted-by":"publisher","first-page":"275","DOI":"10.1007\/s10462-012-9338-y","volume":"42","author":"S. Masoudnia","year":"2014","unstructured":"S. Masoudnia, R. Ebrahimpour, Mixture of experts: a literature survey. Artif. Intell. Rev. 42, 275\u2013293 (2014). https:\/\/doi.org\/10.1007\/s10462-012-9338-y","journal-title":"Artif. Intell. Rev."},{"key":"51_CR29","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1162\/neco.1991.3.1.79","volume":"3","author":"R.A. Jacobs","year":"1991","unstructured":"R.A. Jacobs, M.I. Jordan, S.J. Nowlan, G.E. Hinton, Adaptive mixtures of local experts. Neural Comput. 3, 79\u201387 (1991). https:\/\/doi.org\/10.1162\/neco.1991.3.1.79","journal-title":"Neural Comput."},{"key":"51_CR30","doi-asserted-by":"publisher","first-page":"1399","DOI":"10.1109\/CVPRW50498.2020.00179","volume-title":"CVPR Workshop","author":"S. Pavlitskaya","year":"2020","unstructured":"S. Pavlitskaya, C. Hubschneider, M. Weber, R. Moritz, F. Huger, P. Schlicht, J.M. Zollner, Using mixture of expert models to gain insights into semantic segmentation, in CVPR Workshop (2020), pp. 1399\u20131406. https:\/\/doi.org\/10.1109\/CVPRW50498.2020.00179"},{"key":"51_CR31","unstructured":"Z. Du, M. Shi, J. Deng, S. Zafeiriou, Redesigning multi-scale neural network for crowd counting (2022). http:\/\/arxiv.org\/abs\/2208.02894"},{"key":"51_CR32","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2019.107169","volume":"102","author":"Y. Wang","year":"2020","unstructured":"Y. Wang, L. Wang, H. Wang, P. Li, H. Lu, Blind single image super-resolution with a mixture of deep networks. Pattern Recognit. 102, 107169 (2020). https:\/\/doi.org\/10.1016\/j.patcog.2019.107169","journal-title":"Pattern Recognit."},{"key":"51_CR33","volume-title":"ACCV","author":"D. Liu","year":"2017","unstructured":"D. Liu, Z. Wang, N. Nasrabadi, T. Huang, Learning a mixture of deep networks for single image super-resolution, in ACCV (2017). http:\/\/arxiv.org\/abs\/1701.00823"},{"key":"51_CR34","doi-asserted-by":"publisher","first-page":"4009","DOI":"10.1109\/WACV51458.2022.00406","volume-title":"WACV","author":"M. Emad","year":"2022","unstructured":"M. Emad, M. Peemen, H. Corporaal, MoESR: blind super-resolution using kernel-aware mixture of experts, in WACV (2022), pp. 4009\u20134018. https:\/\/doi.org\/10.1109\/WACV51458.2022.00406"},{"key":"51_CR35","doi-asserted-by":"publisher","first-page":"1024","DOI":"10.1109\/TMI.2017.2780115","volume":"37","author":"R. Rasti","year":"2018","unstructured":"R. Rasti, H. Rabbani, A. Mehridehnavi, F. Hajizadeh, Macular OCT classification using a multi-scale convolutional neural network ensemble. IEEE Trans. Med. Imaging 37, 1024\u20131034 (2018). https:\/\/doi.org\/10.1109\/TMI.2017.2780115","journal-title":"IEEE Trans. Med. Imaging"},{"key":"51_CR36","volume-title":"CVPR","author":"Z. Liu","year":"2022","unstructured":"Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, S. Xie, A ConvNet for the 2020s, in CVPR (2022). http:\/\/arxiv.org\/abs\/2201.03545"},{"key":"51_CR37","volume-title":"CVPR","author":"J. Dai","year":"2017","unstructured":"J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, Y. Wei, Deformable convolutional networks, in CVPR (2017). http:\/\/arxiv.org\/abs\/1703.06211"},{"key":"51_CR38","volume-title":"NIPS","author":"X. Shi","year":"2015","unstructured":"X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W. Wong, W. Woo, Convolutional LSTM network: a machine learning approach for precipitation nowcasting, in NIPS (2015). http:\/\/arxiv.org\/abs\/1506.04214"},{"key":"51_CR39","volume-title":"CVPR","author":"Z. Chen","year":"2022","unstructured":"Z. Chen, Y. Chen, J. Liu, X. Xu, V. Goel, Z. Wang, H. Shi, X. Wang, VideoINR: learning video implicit neural representation for continuous space-time super-resolution, in CVPR (2022). http:\/\/arxiv.org\/abs\/2206.04647"},{"key":"51_CR40","volume-title":"CVPR","author":"D. Sun","year":"2018","unstructured":"D. Sun, X. Yang, M.-Y. Liu, J. Kautz, PWC-net: CNNs for optical flow using pyramid, warping, and cost volume, in CVPR (2018). http:\/\/arxiv.org\/abs\/1709.02371"},{"key":"51_CR41","volume-title":"CVPR","author":"S. Nah","year":"2018","unstructured":"S. Nah, T.H. Kim, K.M. Lee, Deep multi-scale convolutional neural network for dynamic scene deblurring, in CVPR (2018). http:\/\/arxiv.org\/abs\/1612.02177"},{"key":"51_CR42","volume-title":"ICLR","author":"I. Loshchilov","year":"2017","unstructured":"I. Loshchilov, F. Hutter, SGDR: stochastic gradient descent with warm restarts, in ICLR (2017). http:\/\/arxiv.org\/abs\/1608.03983"},{"key":"51_CR43","doi-asserted-by":"publisher","first-page":"600","DOI":"10.1109\/TIP.2003.819861","volume":"13","author":"Z. Wang","year":"2004","unstructured":"Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600\u2013612 (2004). https:\/\/doi.org\/10.1109\/TIP.2003.819861","journal-title":"IEEE Trans. Image Process."},{"key":"51_CR44","volume-title":"CVPR","author":"H. Jiang","year":"2018","unstructured":"H. Jiang, D. Sun, V. Jampani, M.-H. Yang, E. Learned-Miller, J. Kautz, Super SloMo: high quality estimation of multiple intermediate frames for video interpolation, in CVPR (2018). http:\/\/arxiv.org\/abs\/1712.00080"}],"container-title":["Autonomous Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s43684-023-00051-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s43684-023-00051-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s43684-023-00051-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,16]],"date-time":"2023-06-16T10:06:18Z","timestamp":1686909978000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s43684-023-00051-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,16]]},"references-count":44,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["51"],"URL":"https:\/\/doi.org\/10.1007\/s43684-023-00051-9","relation":{},"ISSN":["2730-616X"],"issn-type":[{"value":"2730-616X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,16]]},"assertion":[{"value":"26 September 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 February 2023","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 June 2023","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 June 2023","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Prof. Miaojing Shi is an editorial board member for Autonomous Intelligent Systems and was not involved in the editorial review, or the decision to publish, this article. All authors declare that there are no other competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"5"}}