{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T15:50:04Z","timestamp":1761061804364,"version":"3.41.0"},"reference-count":72,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,4,13]],"date-time":"2023-04-13T00:00:00Z","timestamp":1681344000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Grant from The National Natural Science Foundation of China","award":["U21A20484"],"award-info":[{"award-number":["U21A20484"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2023,6,30]]},"abstract":"<jats:p>\n            In this article, we present a fast real-time tangled memory network that segments the objects effectively and efficiently for semi-supervised video object segmentation (VOS). We propose a tangled reference encoder and a memory bank organization mechanism based on a state estimator to fully utilize the mask features and alleviate memory overhead and computational burden brought by the unlimited memory bank used in many memory-based methods. First, the tangled memory network exploits the mask features that uncover abundant object information like edges and contours but are not fully explored in existing methods. Specifically, a tangled two-stream reference encoder is designed to extract and fuse the features from both RGB frames and the predicted masks. Second, to indicate the quality of the predicted mask and feedback the online prediction state for organizing the memory bank, we devise a target state estimator to learn the\n            <jats:italic>IoU<\/jats:italic>\n            score between the predicted mask and ground truth. Moreover, to accelerate the forward process and avoid memory overflow, we use a memory bank of fixed size to store historical features by designing a new efficient memory bank organization mechanism based on the mask state score provided by the state estimator. We conduct comprehensive experiments on the public benchmarks DAVIS and YouTube-VOS, demonstrating that our method obtains competitive results while running at high speed (66 FPS on the DAVIS16-val set).\n          <\/jats:p>","DOI":"10.1145\/3585076","type":"journal-article","created":{"date-parts":[[2023,2,23]],"date-time":"2023-02-23T22:18:25Z","timestamp":1677190705000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Fast Real-Time Video Object Segmentation with a Tangled Memory Network"],"prefix":"10.1145","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3849-2736","authenticated-orcid":false,"given":"Jianbiao","family":"Mei","sequence":"first","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4035-0630","authenticated-orcid":false,"given":"Mengmeng","family":"Wang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9292-6932","authenticated-orcid":false,"given":"Yu","family":"Yang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4676-6223","authenticated-orcid":false,"given":"Yanjun","family":"Li","sequence":"additional","affiliation":[{"name":"Zhejiang University City College, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4822-8939","authenticated-orcid":false,"given":"Yong","family":"Liu","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2023,4,13]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.565"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2699184"},{"key":"e_1_3_1_4_2","first-page":"9384","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Chen Xi","year":"2020","unstructured":"Xi Chen, Zuoxin Li, Ye Yuan, Gang Yu, Jianxin Shen, and Donglian Qi. 2020. State-aware tracker for real-time video object segmentation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 9384\u20139393."},{"key":"e_1_3_1_5_2","first-page":"5559","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Cheng Ho Kei","year":"2021","unstructured":"Ho Kei Cheng, Yu-Wing Tai, and Chi-Keung Tang. 2021. Modular interactive video object segmentation: Interaction-to-mask, propagation and difference-aware fusion. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 5559\u20135568."},{"key":"e_1_3_1_6_2","first-page":"11781","article-title":"Rethinking space-time networks with improved memory coverage for efficient video object segmentation","volume":"34","author":"Cheng Ho Kei","year":"2021","unstructured":"Ho Kei Cheng, Yu-Wing Tai, and Chi-Keung Tang. 2021. Rethinking space-time networks with improved memory coverage for efficient video object segmentation. Advances in Neural Information Processing Systems 34 (2021), 11781\u201311794.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_7_2","first-page":"129","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision","author":"Cho Suhwan","year":"2022","unstructured":"Suhwan Cho, Heansung Lee, Minjung Kim, Sungjun Jang, and Sangyoun Lee. 2022. Pixel-level bijective matching for video object segmentation. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 129\u2013138."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_1_9_2","article-title":"SSTVOS: Sparse spatiotemporal transformers for video object segmentation","author":"Duke Brendan","year":"2021","unstructured":"Brendan Duke, Abdalla Ahmed, Christian Wolf, Parham Aarabi, and Graham W. Taylor. 2021. SSTVOS: Sparse spatiotemporal transformers for video object segmentation. arXiv preprint arXiv:2101.08833 (2021).","journal-title":"arXiv preprint arXiv:2101.08833"},{"key":"e_1_3_1_10_2","first-page":"16836","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Ge Wenbin","year":"2021","unstructured":"Wenbin Ge, Xiankai Lu, and Jianbing Shen. 2021. Video object segmentation using global and instance embedding learning. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 16836\u201316845."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_12_2","first-page":"7322","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Heo Yuk","year":"2021","unstructured":"Yuk Heo, Yeong Jun Koh, and Chang-Su Kim. 2021. Guided interactive video object segmentation using reliability-based attention maps. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 7322\u20137330."},{"key":"e_1_3_1_13_2","first-page":"4144","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Hu Li","year":"2021","unstructured":"Li Hu, Peng Zhang, Bang Zhang, Pan Pan, Yinghui Xu, and Rong Jin. 2021. Learning position and target consistency for memory-based video object segmentation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 4144\u20134154."},{"key":"e_1_3_1_14_2","first-page":"54","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201918)","author":"Hu Yuan-Ting","year":"2018","unstructured":"Yuan-Ting Hu, Jia-Bin Huang, and Alexander G. Schwing. 2018. VideoMatch: Matching based video object segmentation. In Proceedings of the European Conference on Computer Vision (ECCV\u201918). 54\u201370."},{"issue":"2","key":"e_1_3_1_15_2","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1109\/JAS.2021.1004210","article-title":"Scribble-supervised video object segmentation","volume":"9","author":"Huang Peiliang","year":"2021","unstructured":"Peiliang Huang, Junwei Han, Nian Liu, Jun Ren, and Dingwen Zhang. 2021. Scribble-supervised video object segmentation. IEEE\/CAA Journal of Automatica Sinica 9, 2 (2021), 339\u2013353.","journal-title":"IEEE\/CAA Journal of Automatica Sinica"},{"key":"e_1_3_1_16_2","first-page":"8879","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Huang Xuhua","year":"2020","unstructured":"Xuhua Huang, Jiarui Xu, Yu-Wing Tai, and Chi-Keung Tang. 2020. Fast video object segmentation with temporal aggregation network and dynamic template matching. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 8879\u20138889."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00488"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_48"},{"key":"e_1_3_1_19_2","first-page":"8953","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Johnander Joakim","year":"2019","unstructured":"Joakim Johnander, Martin Danelljan, Emil Brissman, Fahad Shahbaz Khan, and Michael Felsberg. 2019. A generative appearance model for end-to-end video object segmentation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 8953\u20138962."},{"key":"e_1_3_1_20_2","first-page":"7482","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Kendall Alex","year":"2018","unstructured":"Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7482\u20137491."},{"key":"e_1_3_1_21_2","first-page":"123","volume-title":"Proceedings of the Asian Conference on Computer Vision","author":"Khoreva Anna","year":"2018","unstructured":"Anna Khoreva, Anna Rohrbach, and Bernt Schiele. 2018. Video object segmentation with language referring expressions. In Proceedings of the Asian Conference on Computer Vision. 123\u2013141."},{"key":"e_1_3_1_22_2","article-title":"Adam: A method for stochastic optimization","author":"Kingma Diederik P.","year":"2014","unstructured":"Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).","journal-title":"arXiv preprint arXiv:1412.6980"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00651"},{"key":"e_1_3_1_24_2","first-page":"1228","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"36","author":"Lan Meng","year":"2022","unstructured":"Meng Lan, Jing Zhang, Fengxiang He, and Lefei Zhang. 2022. Siamese network with interactive transformer for video object segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 1228\u20131236."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00935"},{"key":"e_1_3_1_26_2","first-page":"735","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Li Yu","year":"2020","unstructured":"Yu Li, Zhuoran Shen, and Ying Shan. 2020. Fast video object segmentation using the global context module. In Proceedings of the European Conference on Computer Vision. 735\u2013750."},{"key":"e_1_3_1_27_2","article-title":"Delving into the cyclic mechanism in semi-supervised video object segmentation","author":"Li Yuxi","year":"2020","unstructured":"Yuxi Li, Ning Xu, Jinlong Peng, John See, and Weiyao Lin. 2020. Delving into the cyclic mechanism in semi-supervised video object segmentation. arXiv preprint arXiv:2010.12176 (2020).","journal-title":"arXiv preprint arXiv:2010.12176"},{"key":"e_1_3_1_28_2","article-title":"Video object segmentation with adaptive feature bank and uncertain-region refinement","author":"Liang Yongqing","year":"2020","unstructured":"Yongqing Liang, Xin Li, Navid Jafari, and Qin Chen. 2020. Video object segmentation with adaptive feature bank and uncertain-region refinement. arXiv preprint arXiv:2010.07958 (2020).","journal-title":"arXiv preprint arXiv:2010.07958"},{"key":"e_1_3_1_29_2","first-page":"3949","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Lin Huaijia","year":"2019","unstructured":"Huaijia Lin, Xiaojuan Qi, and Jiaya Jia. 2019. AGSS-VOS: Attention guided single-shot video object segmentation. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 3949\u20133957."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"issue":"10","key":"e_1_3_1_31_2","first-page":"3665","article-title":"Stereo video object segmentation using stereoscopic foreground trajectories","volume":"49","author":"Liu Chang","year":"2018","unstructured":"Chang Liu, Wenguan Wang, Jianbing Shen, and Ling Shao. 2018. Stereo video object segmentation using stereoscopic foreground trajectories. IEEE Transactions on Cybernetics 49, 10 (2018), 3665\u20133676.","journal-title":"IEEE Transactions on Cybernetics"},{"issue":"4","key":"e_1_3_1_32_2","first-page":"1607","article-title":"Guided co-segmentation network for fast video object segmentation","volume":"31","author":"Liu Weide","year":"2020","unstructured":"Weide Liu, Guosheng Lin, Tianyi Zhang, and Zichuan Liu. 2020. Guided co-segmentation network for fast video object segmentation. IEEE Transactions on Circuits and Systems for Video Technology 31, 4 (2020), 1607\u20131617.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_1_33_2","volume-title":"Proceedings of the CVPR Workshop","volume":"2","author":"Liu Yong","year":"2021","unstructured":"Yong Liu, Ran Yu, Xinyuan Zhao, and Yujiu Yang. 2021. Quality-aware and selective prior enhancement memory network for video object segmentation. In Proceedings of the CVPR Workshop, Vol. 2."},{"key":"e_1_3_1_34_2","article-title":"Video object segmentation with episodic graph memory networks","author":"Lu Xinkai","year":"2020","unstructured":"Xinkai Lu, Wenguan Wang, Martin Danelljan, Tianfei Zhou, Jianbing Shen, and Luc Van Gool. 2020. Video object segmentation with episodic graph memory networks. arXiv preprint arXiv:2007.07020 (2020).","journal-title":"arXiv preprint arXiv:2007.07020"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00374"},{"key":"e_1_3_1_36_2","article-title":"Zero-shot video object segmentation with co-attention Siamese networks","author":"Lu Xiankai","year":"2022","unstructured":"Xiankai Lu, Wenguan Wang, Jianbing Shen, David Crandall, and Jiebo Luo. 2022. Zero-shot video object segmentation with co-attention Siamese networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 4 (2022), 2228\u20132242.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"11","key":"e_1_3_1_37_2","first-page":"7885","article-title":"Segmenting objects from relational visual data","volume":"44","author":"Lu Xiankai","year":"2021","unstructured":"Xiankai Lu, Wenguan Wang, Jianbing Shen, David J. Crandall, and Luc Van Gool. 2021. Segmenting objects from relational visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 11 (2021), 7885\u20137897.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_1_38_2","first-page":"8960","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Lu Xiankai","year":"2020","unstructured":"Xiankai Lu, Wenguan Wang, Jianbing Shen, Yu-Wing Tai, David J. Crandall, and Steven C. H. Hoi. 2020. Learning video object segmentation from unlabeled videos. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 8960\u20138970."},{"key":"e_1_3_1_39_2","first-page":"565","volume-title":"Computer Vision\u2014ACCV 2018","author":"Luiten Jonathon","year":"2018","unstructured":"Jonathon Luiten, Paul Voigtlaender, and Bastian Leibe. 2018. PReMVOS: Proposal-generation, refinement and merging for video object segmentation. In Computer Vision\u2014ACCV 2018. Lecture Notes in Computer Science, Vol. 11364. Springer, 565\u2013580."},{"key":"e_1_3_1_40_2","article-title":"TransVOS: Video object segmentation with transformers","author":"Mei Jianbiao","year":"2021","unstructured":"Jianbiao Mei, Mengmeng Wang, Yeneng Lin, Yi Yuan, and Yong Liu. 2021. TransVOS: Video object segmentation with transformers. arXiv preprint arXiv:2106.00588 (2021).","journal-title":"arXiv preprint arXiv:2106.00588"},{"key":"e_1_3_1_41_2","first-page":"7376","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Oh Seoung Wug","year":"2018","unstructured":"Seoung Wug Oh, Joon-Young Lee, Kalyan Sunkavalli, and Seon Joo Kim. 2018. Fast video object segmentation by reference-guided mask propagation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7376\u20137385."},{"key":"e_1_3_1_42_2","first-page":"9226","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Oh Seoung Wug","year":"2019","unstructured":"Seoung Wug Oh, Joon-Young Lee, Ning Xu, and Seon Joo Kim. 2019. Video object segmentation using space-time memory networks. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 9226\u20139235."},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.372"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.85"},{"key":"e_1_3_1_45_2","article-title":"The 2017 DAVIS Challenge on video object segmentation","author":"Pont-Tuset Jordi","year":"2017","unstructured":"Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbel\u00e1ez, Alex Sorkine-Hornung, and Luc Van Gool. 2017. The 2017 DAVIS Challenge on video object segmentation. arXiv preprint arXiv:1704.00675 (2017).","journal-title":"arXiv preprint arXiv:1704.00675"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01520"},{"key":"e_1_3_1_47_2","first-page":"7406","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Robinson Andreas","year":"2020","unstructured":"Andreas Robinson, Felix Jaremo Lawin, Martin Danelljan, Fahad Shahbaz Khan, and Michael Felsberg. 2020. Learning fast and robust target models for video object segmentation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 7406\u20137415."},{"key":"e_1_3_1_48_2","first-page":"629","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Seong Hongje","year":"2020","unstructured":"Hongje Seong, Junhyuk Hyun, and Euntai Kim. 2020. Kernelized memory network for video object segmentation. In Proceedings of the European Conference on Computer Vision. 629\u2013645."},{"issue":"4","key":"e_1_3_1_49_2","doi-asserted-by":"crossref","first-page":"5558","DOI":"10.1109\/LRA.2020.3007457","article-title":"Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images","volume":"5","author":"Sun Lei","year":"2020","unstructured":"Lei Sun, Kailun Yang, Xinxin Hu, Weijian Hu, and Kaiwei Wang. 2020. Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images. IEEE Robotics and Automation Letters 5, 4 (2020), 5558\u20135565.","journal-title":"IEEE Robotics and Automation Letters"},{"key":"e_1_3_1_50_2","first-page":"10791","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Sun Mingjie","year":"2020","unstructured":"Mingjie Sun, Jimin Xiao, Eng Gee Lim, Bingfeng Zhang, and Yao Zhao. 2020. Fast template matching and update for video object tracking and segmentation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 10791\u201310799."},{"key":"e_1_3_1_51_2","first-page":"5277","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Ventura Carles","year":"2019","unstructured":"Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques, and Xavier Giro-i-Nieto. 2019. RVOS: End-to-end recurrent network for video object segmentation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 5277\u20135286."},{"key":"e_1_3_1_52_2","first-page":"9481","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Voigtlaender Paul","year":"2019","unstructured":"Paul Voigtlaender, Yuning Chai, Florian Schroff, Hartwig Adam, Bastian Leibe, and Liang-Chieh Chen. 2019. FEELVOS: Fast end-to-end embedding learning for video object segmentation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 9481\u20139490."},{"key":"e_1_3_1_53_2","article-title":"Online adaptation of convolutional neural networks for video object segmentation","author":"Voigtlaender Paul","year":"2017","unstructured":"Paul Voigtlaender and Bastian Leibe. 2017. Online adaptation of convolutional neural networks for video object segmentation. arXiv preprint arXiv:1706.09364 (2017).","journal-title":"arXiv preprint arXiv:1706.09364"},{"key":"e_1_3_1_54_2","first-page":"6578","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Voigtlaender Paul","year":"2020","unstructured":"Paul Voigtlaender, Jonathon Luiten, Philip H. S. Torr, and Bastian Leibe. 2020. Siam R-CNN: Visual tracking by re-detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 6578\u20136588."},{"key":"e_1_3_1_55_2","first-page":"1296","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wang Haochen","year":"2021","unstructured":"Haochen Wang, Xiaolong Jiang, Haibing Ren, Yao Hu, and Song Bai. 2021. SwiftNet: Real-time video object segmentation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 1296\u20131305."},{"key":"e_1_3_1_56_2","first-page":"1328","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wang Qiang","year":"2019","unstructured":"Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, and Philip H. S. Torr. 2019. Fast online object tracking and segmentation: A unifying approach. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 1328\u20131338."},{"issue":"7","key":"e_1_3_1_57_2","doi-asserted-by":"crossref","first-page":"2413","DOI":"10.1109\/TPAMI.2020.2966453","article-title":"Paying attention to video object pattern understanding","volume":"43","author":"Wang Wenguan","year":"2020","unstructured":"Wenguan Wang, Jianbing Shen, Xiankai Lu, Steven C. H. Hoi, and Haibin Ling. 2020. Paying attention to video object pattern understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 7 (2020), 2413\u20132428.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"4","key":"e_1_3_1_58_2","doi-asserted-by":"crossref","first-page":"985","DOI":"10.1109\/TPAMI.2018.2819173","article-title":"Semi-supervised video object segmentation with super-trajectories","volume":"41","author":"Wang Wenguan","year":"2018","unstructured":"Wenguan Wang, Jianbing Shen, Fatih Porikli, and Ruigang Yang. 2018. Semi-supervised video object segmentation with super-trajectories. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 4 (2018), 985\u2013998.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_1_59_2","first-page":"3978","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Wang Ziqin","year":"2019","unstructured":"Ziqin Wang, Jun Xu, Li Liu, Fan Zhu, and Ling Shao. 2019. RANet: Ranking attention network for fast video object segmentation. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 3978\u20133987."},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/3506716"},{"key":"e_1_3_1_61_2","first-page":"4996","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wu Dongming","year":"2022","unstructured":"Dongming Wu, Xingping Dong, Ling Shao, and Jianbing Shen. 2022. Multi-level representation learning with semantic alignment for referring video object segmentation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 4996\u20135005."},{"key":"e_1_3_1_62_2","first-page":"4974","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wu Jiannan","year":"2022","unstructured":"Jiannan Wu, Yi Jiang, Peize Sun, Zehuan Yuan, and Ping Luo. 2022. Language as queries for referring video object segmentation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 4974\u20134984."},{"key":"e_1_3_1_63_2","article-title":"Efficient regional memory network for video object segmentation","author":"Xie Haozhe","year":"2021","unstructured":"Haozhe Xie, Hongxun Yao, Shangchen Zhou, Shengping Zhang, and Wenxiu Sun. 2021. Efficient regional memory network for video object segmentation. arXiv preprint arXiv:2103.12934 (2021).","journal-title":"arXiv preprint arXiv:2103.12934"},{"key":"e_1_3_1_64_2","first-page":"585","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201918)","author":"Xu Ning","year":"2018","unstructured":"Ning Xu, Linjie Yang, Yuchen Fan, Jianchao Yang, Dingcheng Yue, Yuchen Liang, Brian Price, Scott Cohen, and Thomas Huang. 2018. YouTube-VOS: Sequence-to-sequence video object segmentation. In Proceedings of the European Conference on Computer Vision (ECCV\u201918). 585\u2013601."},{"key":"e_1_3_1_65_2","first-page":"2946","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"36","author":"Xu Xiaohao","year":"2022","unstructured":"Xiaohao Xu, Jinglu Wang, Xiao Li, and Yan Lu. 2022. Reliable propagation-correction modulation for video object segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 2946\u20132954."},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6944"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2834221"},{"key":"e_1_3_1_68_2","first-page":"332","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Yang Zongxin","year":"2020","unstructured":"Zongxin Yang, Yunchao Wei, and Yi Yang. 2020. Collaborative video object segmentation by foreground-background integration. In Proceedings of the European Conference on Computer Vision. 332\u2013348."},{"key":"e_1_3_1_69_2","article-title":"Associating objects with transformers for video object segmentation","volume":"34","author":"Yang Zongxin","year":"2021","unstructured":"Zongxin Yang, Yunchao Wei, and Yi Yang. 2021. Associating objects with transformers for video object segmentation. Advances in Neural Information Processing Systems 34 (2021).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3391743"},{"key":"e_1_3_1_71_2","first-page":"6949","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhang Yizhuo","year":"2020","unstructured":"Yizhuo Zhang, Zhirong Wu, Houwen Peng, and Stephen Lin. 2020. A transductive approach for video object segmentation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 6949\u20136958."},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2021.108120"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.3013162"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3585076","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3585076","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:40Z","timestamp":1750178260000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3585076"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,13]]},"references-count":72,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,6,30]]}},"alternative-id":["10.1145\/3585076"],"URL":"https:\/\/doi.org\/10.1145\/3585076","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"type":"print","value":"2157-6904"},{"type":"electronic","value":"2157-6912"}],"subject":[],"published":{"date-parts":[[2023,4,13]]},"assertion":[{"value":"2022-08-02","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-02-13","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-04-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}