{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T19:12:54Z","timestamp":1770750774274,"version":"3.50.0"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"2","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62272160, 62302163, and 62372164"],"award-info":[{"award-number":["62272160, 62302163, and 62372164"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Research Foundation of the Department of Natural Resources of Hunan Province","award":["HBZ20240107"],"award-info":[{"award-number":["HBZ20240107"]}]},{"name":"Opening Project of Liaoning Collaboration Innovation Center for CSLE","award":["XTCX2024-004"],"award-info":[{"award-number":["XTCX2024-004"]}]},{"DOI":"10.13039\/501100004761","name":"Hunan Province Natural Science Foundation","doi-asserted-by":"crossref","award":["2025JJ50370"],"award-info":[{"award-number":["2025JJ50370"]}],"id":[{"id":"10.13039\/501100004761","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2026,2,28]]},"abstract":"<jats:p>\n                    With the maturation of Deep Learning-based Video Frame Interpolation (Deep VFI), the left spatial-temporal inconsistency in the synthesis process is greatly improved, which poses a challenge to the current VFI detector. This article presents a dual-stream identification network based on Bi-level Routing Attention and enhanced Spatial-Temporal inconsistency learning (BRA-ST) to address this challenge. Specifically, the spatial inconsistencies in Deep VFI are mainly reflected in their motion regions and moving object edges; thus, the high-pass filter is introduced to enhance them, facilitating the three-stage pyramid structure of BiFormer Blocks with bi-level routing attention in the frame-level stream to learn. To fully exploit the temporal inconsistencies in the Deep VFI video, the time-difference module in the time-level stream is superimposed with the ConvGRU to extract the temporally dependent features of continuous multiple frames. Additionally, the middle layer of the two streams interacts and aggregates with the channel attention, and then, their last layer adaptively merges from a whole and part perspective for the ultimate frame prediction. Finally, the experimental findings on a constructed dataset by the five most advanced Deep VFI methods indicate that the proposed BRA-ST achieved\n                    <jats:inline-formula content-type=\"math\/tex\">\n                      <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(F_{\\text{1Score}}\\)<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    of 99.73%, which is superior to the existing Deep VFI detectors, and further verify that the resolution of BRA-ST for different Deep VFI methods reached 78.55%. Our source codes and dataset are available at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/pan.baidu.com\/s\/1f05_gS0qu5G-SSIkd9F4Hw?pwd=j6t6\">https:\/\/pan.baidu.com\/s\/1f05_gS0qu5G-SSIkd9F4Hw?pwd=j6t6<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1145\/3767749","type":"journal-article","created":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:16:46Z","timestamp":1758028606000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Bi-Level Routing Attention and Enhanced Spatial-Temporal Inconsistency Learning for Deep VFI Video Detection"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6581-4633","authenticated-orcid":false,"given":"Xiangling","family":"Ding","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China and Liaoning Collaboration Innovation Center for CSLE, Shenyang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-9234-0881","authenticated-orcid":false,"given":"Jia","family":"Tang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9058-5767","authenticated-orcid":false,"given":"Yunyi","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2734-659X","authenticated-orcid":false,"given":"Gaobo","family":"Yang","sequence":"additional","affiliation":[{"name":"Hunan University, Changsha, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-5600-0973","authenticated-orcid":false,"given":"Yubo","family":"Lang","sequence":"additional","affiliation":[{"name":"The College of Information Technology and Intelligence, Criminal Investigation Police University of China, Shenyang, China and Liaoning Collaboration Innovation Center for CSLE, Shenyang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,2,10]]},"reference":[{"issue":"6","key":"e_1_3_1_2_2","first-page":"1267","article-title":"Multiple description video coding using joint frame duplication\/interpolation","volume":"29","author":"Bai Huihui","year":"2010","unstructured":"Huihui Bai, Yao Zhao, Ce Zhu, and Anhong Wang. 2010. Multiple description video coding using joint frame duplication\/interpolation. Computing and Informatics 29, 6+ (2010), 1267\u20131282.","journal-title":"Computing and Informatics"},{"key":"e_1_3_1_3_2","first-page":"564","volume-title":"Proceedings of the 2009 17th European Signal Processing Conference","author":"Baroncini Vittorio","year":"2009","unstructured":"Vittorio Baroncini, Licia Capodiferro, Elio D. Di Claudio, and Giovanni Jacovitti. 2009. The polar edge coherence: A quasi blind metric for video quality assessment. In Proceedings of the 2009 17th European Signal Processing Conference. IEEE, 564\u2013568."},{"key":"e_1_3_1_4_2","first-page":"3033","volume-title":"Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing","author":"Bestagini Paolo","year":"2013","unstructured":"Paolo Bestagini, S. Battaglia, Simone Milani, Marco Tagliasacchi, and Stefano Tubaro. 2013. Detection of temporal interpolation in video sequences. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 3033\u20133037."},{"issue":"1","key":"e_1_3_1_5_2","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1109\/TIP.2017.2760518","article-title":"Deep neural networks for no-reference and full-reference image quality assessment","volume":"27","author":"Bosse Sebastian","year":"2017","unstructured":"Sebastian Bosse, Dominique Maniry, Klaus-Robert M\u00fcller, Thomas Wiegand, and Wojciech Samek. 2017. Deep neural networks for no-reference and full-reference image quality assessment. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society 27, 1 (2017), 206\u2013219.","journal-title":"IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.195"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3648364"},{"key":"e_1_3_1_8_2","first-page":"1","volume-title":"Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS)","author":"Ding Xiangling","year":"2020","unstructured":"Xiangling Ding and Yanming Huang. 2020. Identification of frame-rate up-conversion based on spatial-temporal edge and occlusion with convolutional neural network. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1\u20135."},{"issue":"39","key":"e_1_3_1_9_2","doi-asserted-by":"crossref","first-page":"28729","DOI":"10.1007\/s11042-020-09340-4","article-title":"Forgery detection of motion compensation interpolated frames based on discontinuity of optical flow","volume":"79","author":"Ding Xiangling","year":"2020","unstructured":"Xiangling Ding, Yanming Huang, Yue Li, and Jiale He. 2020. Forgery detection of motion compensation interpolated frames based on discontinuity of optical flow. Multimedia Tools and Applications 79, 39 (2020), 28729\u201328754.","journal-title":"Multimedia Tools and Applications"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME51207.2021.9428182"},{"issue":"7","key":"e_1_3_1_11_2","doi-asserted-by":"crossref","first-page":"1497","DOI":"10.1109\/TCSVT.2017.2676162","article-title":"Identification of motion-compensated frame rate up-conversion based on residual signals","volume":"28","author":"Ding Xiangling","year":"2017","unstructured":"Xiangling Ding, Gaobo Yang, Ran Li, Lebing Zhang, Yue Li, and Xingming Sun. 2017. Identification of motion-compensated frame rate up-conversion based on residual signals. IEEE Transactions on Circuits and Systems for Video Technology 28, 7 (2017), 1497\u20131512.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_1_12_2","doi-asserted-by":"crossref","first-page":"1885","DOI":"10.1109\/LSP.2024.3427721","article-title":"ERaL: Exceptional regions-aware deep video interpolation localization","author":"Ding Xiangling","year":"2024","unstructured":"Xiangling Ding, Yulin Zhao, Qing Gu, Dengyong Zhang, and Gaobo Yang. 2024. ERaL: Exceptional regions-aware deep video interpolation localization. IEEE Signal Processing Letters 31 (2024), 1885\u20131889.","journal-title":"IEEE Signal Processing Letters"},{"issue":"7","key":"e_1_3_1_13_2","doi-asserted-by":"crossref","first-page":"1893","DOI":"10.1109\/TCSVT.2018.2852799","article-title":"Robust localization of interpolated frames by motion-compensated frame interpolation based on an artifact indicated map and tchebichef moments","volume":"29","author":"Ding Xiangling","year":"2018","unstructured":"Xiangling Ding, Ningbo Zhu, Leida Li, Yue Li, and Gaobo Yang. 2018. Robust localization of interpolated frames by motion-compensated frame interpolation based on an artifact indicated map and tchebichef moments. IEEE Transactions on Circuits and Systems for Video Technology 29, 7 (2018), 1893\u20131906.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01181"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.image.2020.116066"},{"key":"e_1_3_1_16_2","first-page":"733","volume-title":"Proceedings of the 2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","author":"Gu Qing","year":"2022","unstructured":"Qing Gu, Xiangling Ding, Dengyong Zhang, and Ce Yang. 2022. Forgery detection scheme of deep video frame-rate up-conversion based on dual-stream multi-scale spatial-temporal representation. In Proceedings of the 2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). IEEE, 733\u2013738."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3408293"},{"issue":"2","key":"e_1_3_1_18_2","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1007\/s11554-018-0767-y","article-title":"Hierarchical prediction-based motion vector refinement for video frame-rate up-conversion","volume":"17","author":"He Jiale","year":"2020","unstructured":"Jiale He, Gaobo Yang, Jingyu Song, Xiangling Ding, and Ran Li. 2020. Hierarchical prediction-based motion vector refinement for video frame-rate up-conversion. Journal of Real-Time Image Processing 17, 2 (2020), 259\u2013273.","journal-title":"Journal of Real-Time Image Processing"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19781-9_36"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-017-4519-y"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV56688.2023.00211"},{"key":"e_1_3_1_23_2","unstructured":"Diederik P. Kingma. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980. Retrieved from https:\/\/arxiv.org\/abs\/1412.6980"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00201"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00839"},{"issue":"12","key":"e_1_3_1_26_2","doi-asserted-by":"crossref","first-page":"1010","DOI":"10.1109\/JDT.2014.2334598","article-title":"Multi-channel mixed-pattern based frame rate up-conversion using spatio-temporal motion vector refinement and dual-weighted overlapped block motion compensation","volume":"10","author":"Li Ran","year":"2014","unstructured":"Ran Li, Zongliang Gan, Ziguan Cui, Guijin Tang, and Xiuchang Zhu. 2014. Multi-channel mixed-pattern based frame rate up-conversion using spatio-temporal motion vector refinement and dual-weighted overlapped block motion compensation. Journal of Display Technology 10, 12 (2014), 1010\u20131023.","journal-title":"Journal of Display Technology"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-016-4268-3"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3278310"},{"issue":"1","key":"e_1_3_1_29_2","first-page":"45","article-title":"Motion-compensated frame interpolation with multiframe-based occlusion handling","volume":"12","author":"Lu Qingchun","year":"2015","unstructured":"Qingchun Lu, Ning Xu, and Xiangzhong Fang. 2015. Motion-compensated frame interpolation with multiframe-based occlusion handling. Journal of Display Technology 12, 1 (2015), 45\u201354.","journal-title":"Journal of Display Technology"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.85"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01058"},{"key":"e_1_3_1_32_2","doi-asserted-by":"crossref","first-page":"71973","DOI":"10.1007\/s11042-024-18263-3","article-title":"Detection and localization of multiple inter-frame forgeries in digital videos","author":"Shehnaz","year":"2024","unstructured":"Shehnaz, Mandeep Kaur. 2024. Detection and localization of multiple inter-frame forgeries in digital videos. Multimedia Tools and Applications 83 (2024), 71973\u201372005.","journal-title":"Multimedia Tools and Applications"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2005.859378"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01422"},{"key":"e_1_3_1_35_2","unstructured":"Khurram Soomro Amir Roshan Zamir and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402. Retrieved from https:\/\/arxiv.org\/abs\/1212.0402"},{"key":"e_1_3_1_36_2","first-page":"III","volume-title":"Proceedings of the 2007 IEEE International Conference on Image Processing","volume":"3","author":"Suzuki Yoshinori","year":"2007","unstructured":"Yoshinori Suzuki, Choong Seng Boon, and Thiow Keng Tan. 2007. Inter frame coding with template matching averaging. In Proceedings of the 2007 IEEE International Conference on Image Processing, Vol. 3, IEEE, III\u2013409."},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2020.3002101"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00193"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-016-3468-1"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2025.126416"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11554-019-00865-y"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jisa.2015.12.001"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.forsciint.2022.111442"},{"issue":"2","key":"e_1_3_1_45_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3664654","article-title":"Spatiotemporal inconsistency learning and interactive fusion for deepfake video detection","volume":"21","author":"Zhang Dengyong","year":"2024","unstructured":"Dengyong Zhang, Wenjie Zhu, Xin Liao, Feifan Qi, Gaobo Yang, and Xiangling Ding. 2024. Spatiotemporal inconsistency learning and interactive fusion for deepfake video detection. ACM Transactions on Multimedia Computing, Communications and Applications 21, 2 (2024), 1\u201324.","journal-title":"ACM Transactions on Multimedia Computing, Communications and Applications"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00550"},{"key":"e_1_3_1_47_2","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhang Zihao","year":"2025","unstructured":"Zihao Zhang, Haoran Chen, Haoyu Zhao, Guansong Lu, Yanwei Fu, Hang Xu, and Zuxuan Wu. 2025. Enhanced diffusion for high-quality large-motion video frame interpolation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12652-021-03386-4"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00995"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3767749","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T12:13:32Z","timestamp":1770725612000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3767749"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,10]]},"references-count":48,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2026,2,28]]}},"alternative-id":["10.1145\/3767749"],"URL":"https:\/\/doi.org\/10.1145\/3767749","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,10]]},"assertion":[{"value":"2025-05-20","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-09","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-02-10","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}