{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:06:00Z","timestamp":1750309560486,"version":"3.41.0"},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2025,5,22]],"date-time":"2025-05-22T00:00:00Z","timestamp":1747872000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Hunan Provincial Postgraduate Scientific Research Innovation Project","award":["CSLGCX23089"],"award-info":[{"award-number":["CSLGCX23089"]}]},{"DOI":"10.13039\/501100001809","name":"China National Natural Science Foundation","doi-asserted-by":"crossref","award":["62172059, 62272160, 62402062, and U22A2030"],"award-info":[{"award-number":["62172059, 62272160, 62402062, and U22A2030"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"China National Key Research and Development Program","award":["2022YFB3103500 and 2024YFF0618800"],"award-info":[{"award-number":["2022YFB3103500 and 2024YFF0618800"]}]},{"name":"Hunan Provincial Key Research and Development Program","award":["2024AQ2027"],"award-info":[{"award-number":["2024AQ2027"]}]},{"name":"Hunan Province Natural Science Foundation, China","award":["2025JJ60415 and 2025JJ50370"],"award-info":[{"award-number":["2025JJ60415 and 2025JJ50370"]}]},{"name":"Changsha City Natural Science Foundation, China","award":["kq2402031"],"award-info":[{"award-number":["kq2402031"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,5,31]]},"abstract":"<jats:p>\n            Recently, there has been a growing demand for flow-based video frame interpolation methods, which introduce correlation volumes to supervise the correlation of bidirectional optical flows. However, they often overlook the symmetry of the bidirectional motion field by consuming substantial computational cost, which is reflected in the fact that these methods often require a long runtime. To address these issues, in this article, we propose a bidirectional 3D correlation volume which is suitable for video frame interpolation. By decomposing the 4D correlation volume into two 3D correlation volumes in the horizontal and vertical directions, we significantly enhance the model\u2019s inference speed with a minor sacrifice compared to our baseline. Additionally, when handling 2K video frames, our method achieves several-fold improvement in inference speed compared to other methods which implied correlation volume. The code is available at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/famt0531\">https:\/\/github.com\/famt0531<\/jats:ext-link>\n            .\n          <\/jats:p>","DOI":"10.1145\/3724123","type":"journal-article","created":{"date-parts":[[2025,3,18]],"date-time":"2025-03-18T17:44:11Z","timestamp":1742319851000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Video Frame Interpolation via Fast Bidirectional 3D Correlation Volume"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2789-2980","authenticated-orcid":false,"given":"Dengyong","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Changsha University of Science and Technology, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-0168-8084","authenticated-orcid":false,"given":"Runqi","family":"Lou","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Changsha University of Science and Technology, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2035-6242","authenticated-orcid":false,"given":"Jiaxin","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Changsha University of Science and Technology, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6581-4633","authenticated-orcid":false,"given":"Xiangling","family":"Ding","sequence":"additional","affiliation":[{"name":"Hunan University of Science and Technology, Xiangtan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9131-0578","authenticated-orcid":false,"given":"Xin","family":"Liao","sequence":"additional","affiliation":[{"name":"Hunan University, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2734-659X","authenticated-orcid":false,"given":"Gaobo","family":"Yang","sequence":"additional","affiliation":[{"name":"Hunan University, Changsha, China"}]}],"member":"320","published-online":{"date-parts":[[2025,5,22]]},"reference":[{"doi-asserted-by":"publisher","key":"e_1_3_1_2_2","DOI":"10.1109\/ICCV.2007.4408903"},{"doi-asserted-by":"publisher","key":"e_1_3_1_3_2","DOI":"10.1109\/CVPR.2019.00382"},{"doi-asserted-by":"publisher","key":"e_1_3_1_4_2","DOI":"10.1109\/TPAMI.2019.2941941"},{"doi-asserted-by":"publisher","key":"e_1_3_1_5_2","DOI":"10.1109\/TIP.2018.2825100"},{"doi-asserted-by":"publisher","key":"e_1_3_1_6_2","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"e_1_3_1_7_2","first-page":"168","volume-title":"Proceedings of 1st International Conference on Image Processing, Vol","volume":"2","author":"Charbonnier Pierre","year":"1994","unstructured":"Pierre Charbonnier, Laure Blanc-Feraud, Gilles Aubert, and Michel Barlaud. 1994. Two deterministic half-quadratic regularization algorithms for computed imaging. In Proceedings of 1st International Conference on Image Processing, Vol. 2, IEEE, 168\u2013172."},{"unstructured":"Jieneng Chen Yongyi Lu Qihang Yu Xiangde Luo Ehsan Adeli Yan Wang Le Lu Alan L. Yuille and Yuyin Zhou. 2021. Transunet: Transformers make strong encoders for medical image segmentation. arXiv:2102.04306. Retrieved from http:\/\/arxiv.org\/abs\/2102.04306","key":"e_1_3_1_8_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_9_2","DOI":"10.1609\/aaai.v34i07.6634"},{"issue":"10","key":"e_1_3_1_10_2","doi-asserted-by":"crossref","first-page":"7029","DOI":"10.1109\/TPAMI.2021.3100714","article-title":"Multiple video frame interpolation via enhanced deformable separable convolution","volume":"44","author":"Cheng Xianhang","year":"2021","unstructured":"Xianhang Cheng and Zhenzhong Chen. 2021. Multiple video frame interpolation via enhanced deformable separable convolution. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 10 (2021), 7029\u20137045.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"doi-asserted-by":"publisher","key":"e_1_3_1_11_2","DOI":"10.1609\/aaai.v34i07.6693"},{"doi-asserted-by":"publisher","key":"e_1_3_1_12_2","DOI":"10.1145\/3648364"},{"unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929. Retrieved from http:\/\/arxiv.org\/abs\/2010.11929","key":"e_1_3_1_13_2"},{"key":"e_1_3_1_14_2","first-page":"6410","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Hu Mengshun","year":"2024","unstructured":"Mengshun Hu, Kui Jiang, Zhihang Zhong, Zheng Wang, and Yinqiang Zheng. 2024. IQ-VFI: Implicit quadratic motion estimation for video frame interpolation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 6410\u20136419."},{"key":"e_1_3_1_15_2","first-page":"3553","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Hu Ping","year":"2022","unstructured":"Ping Hu, Simon Niklaus, Stan Sclaroff, and Kate Saenko. 2022. Many-to-many splatting for efficient video frame interpolation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 3553\u20133562."},{"key":"e_1_3_1_16_2","first-page":"603","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Huang Zilong","year":"2019","unstructured":"Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. 2019. Ccnet: Criss-cross attention mantic segmentation. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, 603\u2013612."},{"doi-asserted-by":"publisher","key":"e_1_3_1_17_2","DOI":"10.1007\/978-3-031-19781-9_36"},{"doi-asserted-by":"publisher","key":"e_1_3_1_18_2","DOI":"10.1109\/CVPR.2018.00938"},{"key":"e_1_3_1_19_2","first-page":"5049","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision","author":"Jin Xin","year":"2023","unstructured":"Xin Jin, Longhai Wu, Guotao Shen, Youxin Chen, Jie Chen, Jayoon Koo, and Cheul-hee Hahm. 2023. Enhanced bi-directional motion estimation for video frame interpolation. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, 5049\u20135057."},{"unstructured":"Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980. Retrieved from http:\/\/arxiv.org\/abs\/arXiv:1412.6980","key":"e_1_3_1_20_2"},{"key":"e_1_3_1_21_2","first-page":"1978","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Kong Lingtong","year":"2022","unstructured":"Lingtong Kong, Boyuan Jiang, Donghao Luo, Wenqing Chu, Xiaoming Huang, Ying Tai, Chengjie Wang, and Jie Yang. 2022. Ifrnet: Intermediate feature refine network for efficient frame interpolation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 1969\u20131978."},{"doi-asserted-by":"publisher","key":"e_1_3_1_22_2","DOI":"10.1162\/neco.1989.1.4.541"},{"key":"e_1_3_1_23_2","volume-title":"Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Lee Hyeongmin","year":"2020","unstructured":"Hyeongmin Lee, Taeoh Kim, Tae Young Chung, Daehyun Pak, and Sangyoun Lee. 2020. AdaCoF: Adaptive collaboration of flows for video frame interpolation. In Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"doi-asserted-by":"publisher","key":"e_1_3_1_24_2","DOI":"10.3390\/ijgi9110635"},{"doi-asserted-by":"publisher","key":"e_1_3_1_25_2","DOI":"10.1109\/CVPR52729.2023.00945"},{"key":"e_1_3_1_26_2","first-page":"1","volume-title":"Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME)","author":"Lin Hezheng","year":"2022","unstructured":"Hezheng Lin, Xing Cheng, Xiangyu Wu, and Dong Shen. 2022. Cat: Cross attention in vision transformer. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1\u20136."},{"key":"e_1_3_1_27_2","first-page":"19125","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Liu Chunxu","year":"2024","unstructured":"Chunxu Liu, Guozhen Zhang, Rui Zhao, and Limin Wang. 2024. Sparse global matching for video frame interpolation with large motion. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 19125\u201319134."},{"doi-asserted-by":"publisher","key":"e_1_3_1_28_2","DOI":"10.1109\/ICCV48922.2021.00986"},{"doi-asserted-by":"publisher","key":"e_1_3_1_29_2","DOI":"10.1109\/CVPR52688.2022.00352"},{"doi-asserted-by":"publisher","key":"e_1_3_1_30_2","DOI":"10.1609\/aaai.v32i1.12276"},{"unstructured":"Christopher Montgomery and H. Lars. 1994. Xiph. org video test media (Derf\u2019s collection). Retrieved from https:\/\/media.xiph.org\/video\/derf","key":"e_1_3_1_31_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_32_2","DOI":"10.1109\/CVPR.2018.00183"},{"doi-asserted-by":"publisher","key":"e_1_3_1_33_2","DOI":"10.1109\/CVPR42600.2020.00548"},{"key":"e_1_3_1_34_2","first-page":"261","volume-title":"IEEE International Conference on Computer Vision","author":"Niklaus Simon","year":"2017","unstructured":"Simon Niklaus, Long Mai, and Feng Liu. 2017. Video frame interpolation via adaptive separable convolution. In IEEE International Conference on Computer Vision, 261\u2013270."},{"unstructured":"Moritz Nottebaum Stefan Roth and Simone Schaub-Meyer. 2022. Efficient feature extraction for high-resolution video frame interpolation. arXiv:2211.14005. Retrieved from http:\/\/arxiv.org\/abs\/2211.14005","key":"e_1_3_1_35_2"},{"key":"e_1_3_1_36_2","first-page":"109","volume-title":"Proceedings of the 16th European Conference on Computer Vision (ECCV\u201920)","author":"Park Junheum","year":"2020","unstructured":"Junheum Park, Keunsoo Ko, Chul Lee, and Chang-Su Kim. 2020. Bmbc: Bilateral motion estimation with bilateral cost volume for video interpolation. In Proceedings of the 16th European Conference on Computer Vision (ECCV\u201920). Springer, 109\u2013125."},{"doi-asserted-by":"publisher","key":"e_1_3_1_37_2","DOI":"10.1109\/ICCV48922.2021.01427"},{"issue":"176","key":"e_1_3_1_38_2","doi-asserted-by":"crossref","first-page":"108559","DOI":"10.1016\/j.compbiomed.2024.108559","article-title":"ConvMedSegNet: A multi-receptive field depthwise convolutional neural network for medical image segmentation","author":"Peng Yuxu","year":"2024","unstructured":"Yuxu Peng, Xin Yi, Dengyong Zhang, Lebing Zhang, Yuehong Tian, and Zhifeng Zhou. 2024. ConvMedSegNet: A multi-receptive field depthwise convolutional neural network for medical image segmentation. Computers in Biology and Medicine 176 (2024), 108559.","journal-title":"Computers in Biology and Medicine"},{"issue":"2","key":"e_1_3_1_39_2","first-page":"345","article-title":"Learning for unconstrained space-time video super-resolution","volume":"68","author":"Shi Zhihao","year":"2021","unstructured":"Zhihao Shi, Xiaohong Liu, Chengqi Li, Linhui Dai, Jun Chen, Timothy N. Davidson, and Jiying Zhao. 2021. Learning for unconstrained space-time video super-resolution. IEEE Transactions on Broadcasting 68, 2 (2021), 345\u2013358.","journal-title":"IEEE Transactions on Broadcasting"},{"doi-asserted-by":"publisher","key":"e_1_3_1_40_2","DOI":"10.1109\/ICCV48922.2021.01422"},{"unstructured":"Khurram Soomro Amir Roshan Zamir and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402. Retrieved from http:\/\/arxiv.org\/abs\/arXiv:1212.0402","key":"e_1_3_1_41_2"},{"key":"e_1_3_1_42_2","volume-title":"Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Sun Deqing","year":"2018","unstructured":"Deqing Sun, Xiaodong Yang, Ming Yu Liu, and Jan Kautz. 2018. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"doi-asserted-by":"publisher","key":"e_1_3_1_43_2","DOI":"10.1007\/978-3-030-58536-5_24"},{"key":"e_1_3_1_44_2","first-page":"10347","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Touvron Hugo","year":"2021","unstructured":"Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv\u00e9 J\u00e9gou. 2021. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning. PMLR, 10347\u201310357."},{"key":"e_1_3_1_45_2","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, Vol. 30.","journal-title":"Advances in Neural Information Processing Systems"},{"doi-asserted-by":"publisher","key":"e_1_3_1_46_2","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"e_1_3_1_47_2","first-page":"12119","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Wu Guangyang","year":"2023","unstructured":"Guangyang Wu, Xiaohong Liu, Kunming Luo, Xi Liu, Qingqing Zheng, Shuaicheng Liu, Xinyang Jiang, Guangtao Zhai, and Wenyi Wang. 2023. Accflow: Backward accumulation for long-range optical flow. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, 12119\u201312128."},{"key":"e_1_3_1_48_2","first-page":"2753","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wu Guangyang","year":"2024","unstructured":"Guangyang Wu, Xin Tao, Changlin Li, Wenyi Wang, Xiaohong Liu, and Qingqing Zheng. 2024. Perception-oriented video frame interpolation via asymmetric blending. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2753\u20132762."},{"doi-asserted-by":"publisher","key":"e_1_3_1_49_2","DOI":"10.1109\/ICCV48922.2021.01033"},{"issue":"127","key":"e_1_3_1_50_2","first-page":"1106","article-title":"Video enhancement with task-oriented flow","author":"Xue Tianfan","year":"2019","unstructured":"Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision 127 (2019), 1106\u20131125.","journal-title":"International Journal of Computer Vision"},{"issue":"4","key":"e_1_3_1_51_2","doi-asserted-by":"crossref","first-page":"1060","DOI":"10.1049\/ipr2.12695","article-title":"Video frame interpolation via residual blocks and feature pyramid networks","volume":"17","author":"Yang Xiaohui","year":"2023","unstructured":"Xiaohui Yang, Haoran Zhang, Zhe Qu, Zhiquan Feng, and Jinglan Tian. 2023. Video frame interpolation via residual blocks and feature pyramid networks. IET Image Processing 17, 4 (2023), 1060\u20131070.","journal-title":"IET Image Processing"},{"issue":"89","key":"e_1_3_1_52_2","first-page":"115982","article-title":"Video frame interpolation using deep cascaded network structure","author":"Yang Yoonmo","year":"2020","unstructured":"Yoonmo Yang and Byung Tae Oh. 2020. Video frame interpolation using deep cascaded network structure. Signal Processing: Image Communication 89 (2020), 115982.","journal-title":"Signal Processing: Image Communication"},{"doi-asserted-by":"publisher","key":"e_1_3_1_53_2","DOI":"10.1145\/3547660"},{"doi-asserted-by":"publisher","key":"e_1_3_1_54_2","DOI":"10.1109\/CVPR52729.2023.00550"},{"doi-asserted-by":"crossref","unstructured":"Chang Zhou Jie Liu Jie Tang and Gangshan Wu. 2023. Video frame interpolation with densely queried bilateral correlation. arXiv:2304.13596. Retrieved from http:\/\/arxiv.org\/abs\/2304.13596","key":"e_1_3_1_55_2","DOI":"10.24963\/ijcai.2023\/198"},{"key":"e_1_3_1_56_2","first-page":"286","volume-title":"Proceedings of the 14th European Conference on Computer Vision (ECCV \u201916)","author":"Zhou Tinghui","year":"2016","unstructured":"Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A Efros. 2016. View synthesis by appearance flow. In Proceedings of the 14th European Conference on Computer Vision (ECCV \u201916). Springer, 286\u2013301."},{"unstructured":"Xizhou Zhu Weijie Su Lewei Lu Bin Li Xiaogang Wang and Jifeng Dai. 2020. Deformable DETR: Deformable transformers for end-to-end object detection. arXiv:2010.04159. Retrieved from http:\/\/arxiv.org\/abs\/2010.04159","key":"e_1_3_1_57_2"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3724123","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3724123","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:59Z","timestamp":1750295939000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3724123"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,22]]},"references-count":56,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,5,31]]}},"alternative-id":["10.1145\/3724123"],"URL":"https:\/\/doi.org\/10.1145\/3724123","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2025,5,22]]},"assertion":[{"value":"2024-06-19","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-28","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}